Job Summary: The Data Engineer is responsible for developing and optimizing `big data' data pipelines, architectures, and data sets for Populytics, as required for population health management, provider profiling, clinical initiatives, medical expense budget tracking, and other applications. They maintain complex technology infrastructure and collaborate with clinical & business analysts, web developers, data scientists, and other customers to implement new features and plan for future projects. Data Engineers develop the data pipeline using proven design principles, design patterns, and automated testing to continuously enhance their software. They participate in group meetings with other departments to clarify processing requirements and designs, and they must remain current on relevant technologies and new industry trends to sustain a strong technical direction for Populytics. The Data Engineer develops and maintains an optimal data pipeline architecture, using SQL and `big data' technologies, as required for optimal extraction, transformation, and loading of data from a wide variety of data sources, including but not limited to medical and pharmacy claims, HR, lab, EMR, Provider, and Payer systems. Responsibilities include contributing to the development or acquisition of, use of, and support for appropriate processes and tools for secure, efficient, and reliable data exchange, for data gap analysis and transformation, for data profiling and auditing, for data integration across time periods or data sources, for generating input files and consuming output files from advanced analytics tools, and for feeding the outputs to reporting data marts in a manner that satisfies resource and performance constraints. Also responsible for the development and maintenance of audits and monitoring tools that utilize the data pipeline to provide actionable insights into data accuracy, operational efficiency, volume or cost fluctuations, and other key business performance metrics. The Data Engineer continually identifies and develops internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, extensibility, and maintainability. Works collaboratively with management and business partners, including lead and senior data engineers, clinical and business analysts, web developers, and data scientists to assist with data-related technical issues and support their data infrastructure needs. Routinely conducts root cause analyses on internal and external data and pipeline processes to answer specific business questions and identify opportunities for improvement.
Minimum Requirements: Work requires the level of knowledge normally attained through completion of a Bachelor's degree in Software Engineering, Computer Science, or Computer & Information Science.
Minimum Experience: Three (3) years progressive experience as a data wrangler, data engineer, or programmer/analyst. Must have working experience conducting root cause analyses on internal and external data and processes to answer specific business questions and identify opportunities for improvement, as well as experience manipulating, processing and extracting value from large disconnected datasets. Must possess working knowledge of programming with object-oriented/object function scripting languages such as Java, Python, C#, and Scala, as well as working knowledge of SQL and query authoring. Must have work experience with relational SQL and NoSQL databases, such as MySQL, SQL Server, Postgres, and HBase. Must also be familiar with data pipeline and workflow management tools such as Oozie, Azkaban, SSIS, and Pentaho, and be familiar with big data tools such as Hadoop, Spark, Hive, Hive LLAP, and Map Reduce Programming.
Work requires the ability to analyze and solve systems automation problems and to analyze and interpret data. Work requires ability to work well in a team environment and to be self motivated and self disciplined, carrying out responsibilities to their completion and functioning under shifting priorities, pressure, and deadlines. Work requires ability to communicate clearly and accurately in both verbal and written mode, accurately and concisely describing problems, solutions, specifications, and plans.
Preferred Qualifications: Preferred Experience: Minimum requirements plus working experience building processes that support data validation, transformation and integration, data structures, and workload management. Prefer someone who is familiar with building ?big data? data pipelines, architectures, and data sets. Familiarity with medical and Rx claims data, lab data, EMR data, clinical terminology, clinical guidelines, clinical outcomes, and healthcare utilization measures is a definite bonus, as is familiarity with VMWare.
Licensure and Certifications: Hortonworks Certified Associate;Clarity Data Model ? Epic Care Ambulatory Certification;Clarity Data Model ? Epic Care Inpatient Certification