Design and develop production level search applications for easy searching of content within a big data medical free text data asset.
Work with EHR data across teams with ETL, NLP engineers and data scientists, researchers and clinicians to provide searching services with a high data quality control standard.
Data Engineer to Build Advanced Search Capability using the Elasticsearch stack Applicant MUST have the following skills:
Elasticsearch stack knowledge (demonstrable experience in building search capability tooling using Elasticsearch).
Python programming knowledge and experience.
Apache Spark, in particular with PySpark API, knowledge and experience.
Data pipeline experience.
Excellent communication skills.
Candidate would be part of a small core NLP Team with 15 core team members (data scientists, project manager, medical informaticists, data analysts) with support from 12 clinical annotators integrated into the team via a 3rd party vendor What are the top 5-10 responsibilities for this position.
Demonstrable senior proficiency level and knowledge of the Elasticsearch stack.
Programming experience, including solid Python experience, following software engineering best practices.
Experience building and maintaining data pipelines and data assets. Experience Building dashboards and user interfaces using Kibana or other visualization tools.
Experience with distributed data processing frameworks such as Spark or MapReduce. Experience as an individual contributor, hands-on developer, non-manager role executing on engineering projects as a primary job responsibility. Demonstrated knowledge of data management best practices
currently the main technologies we are using are Apache Spark, Hadoop, Hive, Luigi, Python (and a little bit of Scala) and the platform we use is the on-prem Hadoop cluster.
Candidates should be solid with at least some of these technologies, and follow good engineering practices, such as testing, code reviews and putting in place monitoring systems like dashboards or alerts.
Experience with dashboard development in Elasticsearch Experience with data pipeline frameworks such as Airflow, Luigi or Oozie Experience with cloud-based computing (AWS or Azure) Familiarity with EHR data and standards (HL7 or FHIR) Experience with non-relational data bases.
Experience with code and process documentation Experience with explaining, educating, presenting and training non-engineers on engineering concepts and processes.
Experience with continuous integration and delivery Where is the work to be performed.
We are seeking a Data Engineer who is eager to tackle the challenges of processing vast amounts of EHR data originating from multiple sources.
Candidate will need to develop a deep understanding of the data and drive efforts to maintain and improve data quality and usability.
Candidate should understand the importance and value of writing maintainable, documented, and well-tested code throughout the entire product lifecycle.
Above all, candidate should be curious about what is possible in healthcare with the right tools and infrastructure.