Our mission is to be the catalyst for massive, measurable, data-informed healthcare improvement through:
Data: integrate data in a flexible, open & scalable platform to power healthcare’s digital transformation
Analytics: deliver analytic applications & services that generate insight on how to measurably improve
Expertise: provide clinical, financial & operational experts who enable & accelerate improvement
Engagement: attract, develop and retain world-class team members by being a best place to work
Role: Data Engineer
Team: Life Sciences
Location: US Remote
Travel: <10-15%, US
The Data Engineer for the Life Sciences Business will be responsible for going into client environments to augment existing professional services resources in building out client data, including the acquisition of source marts and harmonized data marts (DOS Marts). The Data Engineer will help the Life Sciences business achieve the goal of having a homogenous set of data elements across clients that will enable cross-client data analytics. Additionally, the Data Engineer will assist clients in extending the standard set of data available across each client, for instance integrating clinical text data and data from patient registries.
This role is a great fit for someone who has significant data management and acquisition experience in the healthcare space. The work that the Data Engineer performs will have immediate impact on healthcare providers, but also contribute to the mission of accelerating clinical innovation and precision medicine through novel Life Sciences partnerships.
Duties & Responsibilities
Big Data Engineer with extensive experience with the aspect of building end to end data platforms and data pipelines and data flows. This should include data ingestion/integration, data storage,
data transformation, data processing, data deployment, data operations and data cataloging.
The Engineer should be able to design and work closely with other big data architects, big data developers, data scientists, DevOps and data ops engineers to develop a platform capable of executing operational, analytic and predictive workloads that serves thousands of applications and supports machine learning deployment and inferencing.
- Extensive experience as a data Engineer, database developer and building data driven applications
- good understanding of distributed systems and distributed databases
- Experience with ETL/ELT development, Batch processing and stream processing.
- Familiarity with frameworks like Spark and Kafka and tools around them
- Extensive experience with Data Warehouses on cloud - Redshift / Big Query / Snowflake / Azure Synapse
- Experience with Azure Ecosystem - Data Lakes - ADLS Gen2, Azure SQLWarehouse
- Understanding of Big Data Ingestion/Integration/Storage/Processing, transformation/ETL tools and data formats for storage
- Ability to debug, troubleshoot and optimize data solutions in the Big Data Ecosystem with tools like Spark, Presto, Hive, Kafka and NoSQL & relational Databases and Data Warehouses
- Experience working with SQL Engines on large data - Presto, Impala, Dremio, SparkSQL, Hive, Drill, Druid and others
- Knowledge and Experience of working with DevOps and DataOps teams and collaborate with them to develop the process and automate deployment
- Programming experience with one or more - Python, Java, Scala
- Expertise with both intermediate and advanced level of SQL query development
- ability to understand and work with complex datasets and build solutions around them with data modeling
- Work with other team members - business analysts and data analyst, data stewards to understand the requirements and build solutions
- ability, passion and aptitude to learn new programming and querying languages and applying them to build data solutions
- good understanding of tools around the DevOps ecosystem with basic understanding of dockers and CI/CD processes
- good level of expertise working with GIT
- experience working with DevOps teams
- Experience with data orchestration tools - dbt, prefect, Airflow
- Experience with Data Warehousing, Data Modeling, Data Marts, Data Virtualization, MPP based Engines like Azure Synapse, Redshift, Vertica, BigQuery, Snowflake etc.
- Experience with relational databases like - SQLServer, Postgres, MySQL, MariaDB, Oracle etc
- Working with at least one or more of NoSQL Databases and able to develop a data model with at least one or more of the main types of NoSQL
- databases like
- Key Value data stores - Redis, DynamoDB, Riak,
- Document databases - MongoDB, CouchDB, Couchbase
- Graph Databases - Neo4J,
- Wide column databases - Cassandra, HBase, Scylla
- Time Series databases - InfluxDB, TimeScale
- Search engines and databases - Elastic Search, Solr.
- InMemory databases or InMemory Grids - Apache Ignite, GridGain etc
Education & Relevant Experience
- BS in Computer Science, Health Informatics, or related field
- 2+ years of experience working in a SQL-based data engineering role
- 2+ years of experience with clinical/healthcare data
- 2+ years of experience of direct exposure to data from a leading EMR (Epic, Cerner, Meditech, etc.)
- Experience in programming/scripting languages such as Python a plus
The above statements describe the general nature and level of work being performed in this job function. They are not intended to be an exhaustive list of all duties, and indeed additional responsibilities may be assigned by Health Catalyst .
At Health Catalyst, we appreciate the opportunity to benefit from the diverse backgrounds and experiences of others. Because of our deep commitment to respect every individual, Health Catalyst is an equal opportunity employer.