Medidata Solutions

Senior Streaming Data Engineer

Job description

Medidata: Conquering Diseases Together


Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes. More than one million registered users across 1,400 customers and partners access the world's most-used platform for clinical development, commercial, and real-world data. Medidata, a Dassault Systèmes company, is headquartered in New York City and has offices around the world to meet the needs of its customers. Discover more at www.medidata.com.


Your Mission:

Medidata is the leader in developing technologies that allow our customers to get new medicines to patients faster. Building on our long history of delivering world-class clinical applications to the life sciences industry, Medidata’s Data Fabric organization is staffed with a passionate team of technology and scientific experts, tackling the industry’s most difficult technical challenges in order to push the boundaries of possibility for our clients and most importantly, for patients. Together we can deliver meaningful advanced digital transformation to the industry in order to achieve our vision.

The Senior Streaming Data Engineer role is an individual contributor position. Nonetheless, the role will involve some technical guidance of more junior streaming data engineers in the course of the team's development activities.

As a Senior Streaming Data Engineer, you will

  • Devise and implement event-driven, distributed, stateful data pipelines using Flink and Kafka in a Kubernetes environment that integrate and analyze diverse sources of clinical, sensor and real-world data to support biomarker discovery;

  • Contribute to the development of incremental online machine learning algorithms to solve out-of-core analytical problems;

  • Proactively support and enhance the use, governance and evolution of a growing, cross-functional, multi-format type registry of schemas for streaming data algorithms and APIs;

  • Extend and enhance data APIs to communicate meaningful insights from streaming sources and make them accessible to patients, clinical research professionals and other stakeholders;

  • Act as an expert technical resource on streaming data technologies in cross-functional interactions with engineers and analysts across the R&D organization; and

  • Document, compare, critique, recommend and advocate alternative data architecture and data modeling solutions.

Your Competencies:

  • Strong experience working and deploying in the AWS Cloud, especially its file and data-oriented services

  • Good experience using docker and some experience working with Kubernetes

  • Significant experience developing event-driven data pipelines using Kafka and Flink at scale

  • Extensive experience working with enterprise data models and data-oriented APIs (experience using GraphQL a plus)

  • Familiarity with schema registries for serialization and deserialization of event streams using, for instance, AVRO and Protobuf

  • Deep experience using a variety of relational and non-relational database technologies, including, for instance, Postgres, MySQL, Cassandra, DynamoDB, Athena, Elasticsearch, and Redis

  • Expert level SQL coding knowledge, including writing and optimizing complex queries

  • Demonstrably proficient coding in at least one language other than SQL like Scala, Python, Java, Go, Typescript, or Rust in the context of data-oriented problems

  • Some experience using graph-based, semantic models and technologies like RDF, OWL, SHACL, Tinkerpop, AWS Neptune, Stardog, or Anzograph

  • Experience applying agile software development methodology to enterprise data engineering, with tools like Git, JIRA, Travis and others

Your Education and Experience:

  • Bachelor's in science, engineering, math, data or relevant field

  • At least five (5) years of experience working in software development and at least two (2) years of experience working in event-driven data engineering or data science with relevant big data streaming technologies

  • Must be located in the continental US

Medidata is making a real difference in the lives of patients everywhere by accelerating critical drug and medical device development, enabling life-saving drugs and medical devices to get to market faster. Our products sit at the convergence of the Technology and Life Sciences industries, one of most exciting areas for global innovation. Nine of the top 10 best-selling drugs in 2017 were developed on the Medidata platform.

Medidata Solutions have powered over 17,000+ clinical trials giving us the largest collection of clinical trial data in the world. With this asset, we pioneer innovative, advanced applications and intelligent data analytics, bringing an unmatched level of quality and efficiency to clinical trials enabling treatments to reach waiting patients sooner.


Medidata Solutions, Inc. is an Equal Opportunity Employer. Medidata Solutions provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability status, protected veteran status, or any other characteristic protected by the law. Medidata Solutions complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities.

Please let the company know that you found this position on this Job Board as a way to support us, so we can keep posting cool jobs.