Clinetic

Data Engineer

Spark SQL

November 16, 2023

Apply Now

Raleigh-Durham-Chapel Hill Area, United States

November 16, 2023

Apply Now

Job description

Clinetic is putting health data in motion for good! We are a health software and technology company helping unleash the potential of electronic health record systems for research, evidence generation, and new care delivery models. We are passionate about developing creative solutions to modernize the way health care is delivered and how clinical research is performed to ultimately improve the health and lives of patients. We are actively seeking new people to join our data driven, collaborative, and solution-oriented team.

What You'll Do

Develop deep expertise in healthcare data
Explore and map EHR datasets to common data models
Work with product managers, engineers and customer data analytics teams
Create a cutting-edge distributed analytics platform that aggregates data from multiple different sources
Build robust data pipelines that empower researchers to conduct clinical trials and disease surveillance in real-time

Ideal Candidate

You use a combination of persistence, research, problem-solving skills and experience to overcome obstacles
You take pride in your work. You are attentive to detail, but also flexible.
You are available for and responsive to questions. You are professional and collegial in your communications.
You like being the person that others rely on.
You quickly learn new technologies as needed and recognize that you are engaged in timely, business-critical tasks.
You are transparent in what you do. You discuss, document, and commit your work as needed.
You care about good data models, abstractions, and readable code.
You enjoy optimizing high throughput applications and ensuring data quality.
You enjoy working in an Agile environment and welcome constructive feedback.
You approach problems with a product development mindset.

Requirements

Strong proficiency in Apache Spark, including Spark SQL, DataFrame API, and Spark Streaming.
Proficiency in working with large-scale relational database management systems (RDBMS) such as PostgreSQL and SQL Server.
Deep understanding of SQL optimization techniques, query execution plans, and indexing strategies.
Experience with query profiling, identifying bottlenecks, and fine-tuning queries to enhance efficiency.
Solid understanding of distributed computing concepts and data processing frameworks.
Experience with data modeling, ETL/ELT processes, and data integration techniques.
Proficiency in programming languages such as Scala, Java, or Python.
Hands-on experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and related data services.
Proven ability to take ownership and work independently

Nice to Have

Experience working with NoSQL data stores like ElasticSearch or MongoDB
Experience working with healthcare data
Experience with Kubernetes
Experience with machine learning
Experience with synthetic data generation

Benefits

This is a full-time position based in Durham, NC, one of the highest ranked cities in the country for growth, entrepreneurship, affordability, dining and entertainment. As a rapidly growing startup, we offer a robust benefits package including the following: