Senior Data Engineer

Job description

AstraZeneca are looking for a Senior Data Engineer to help build and modernise our next generation Drug Discovery platform!

AstraZeneca is investing heavily in our Augmented Drug Design-Make-Test and Analyse (DMTA) platforms as we seek to deliver better, differentiated Candidate Drugs into trials, faster, for greater patient benefit. We have made extraordinary strides in creating the AI and ML toolsets that will pave the way for this transformation and platforms supporting new therapeutic modalities. The next part of this journey will involve firming up our data strategy and building the data pipelines that will enable us to improve the value of our new technologies in bringing small molecule and other therapeutic modalities (e.g. peptides, oligonucleotides and conjugates) through the pipeline.

In Early Science, we have highly skilled scientists generating ideas and performing experiments in the support of complex drug discovery projects. Our environment is driven by scientific and technical innovation with a high degree of diversity in workflows, data, vendor solutions and in-house builds.

Your work will have a direct impact on the science we do and you have an opportunity to enable our scientists to undertake science not possible today.

About the role

Working alongside Software Engineers, Architects, Analysts and other IT and science partners, you will devise technical solutions, estimate and deliver and run operationally sustainable data flows in our DevOps teams. You will use your technical acuity to troubleshoot and provide innovative solutions to ensure smooth experiences for our users.

In this role you will join a globally distributed team of software engineers, support engineers and informaticians in our Augmented DMTA platform. You will be responsible for advising and delivering on a forming data strategy to the smooth flow of data through our systems. You will enable the integration of data from new systems which will enable scientists to search, retrieve and manipulate experimental data across all therapeutic modalities.

We’re an expanding global development team which is migrating as close to as we can to a true DevOps model. The Portfolio is a mixture of unique self-developed and Commercial Off the Shelf Software (COTS) solutions. It’s a siloed environment where you will have the challenge and chance to define the software and data experience to the user seamless.


  • Co-operate with groups (pods) of engineers, distributed across multiple locations to ensure a hard-working team that consistently and iteratively delivers high-impact, high-quality solutions while maintaining high quality operations
  • Work across legacy relational databases and with S3/file/elastic data structures and facilitate modernisation of the estate in conjunction with software engineering.
  • Collaborate with information architects and colleagues from the Data Office to align with central data policies and exposure of information through data catalogues and ontology management systems
  • Work with BAs, Engineers and Scientists to understand and estimate requirements and provide tailored, complete data pipelining and ETL solutions that fit the needs by using technology that best fits the solution in a way that is operationally stable and sustainable.
  • Work with project teams integrating new software (e.g. for entity registration) into downstream systems, ensuring optimised experiences and data fidelity.
  • Provide input into the technical direction of the platform, ensuring we have the underlying data structures and technology to enable DMTA scientists to work efficiently and accurately and can cope with a constantly evolving scientific environment.
  • Advocate and advance modern, agile software development practices and help develop and voice support for great data engineering culture
  • Support more junior data engineers mentoring, feedback and hands-on career development.
  • Collaborate in the transformation of an estate of siloed systems into an ecosystem that delivers great user experiences.


  • Solid experience of working with relational databases (especially Oracle, Postgres)
  • Familiarity with data modelling techniques and hands-on modelling experience
  • Experience of ETL tools (e.g. Talend, Informatica) and/or scientific data pipelining (e.g. Pipeline Pilot).
  • Experience of cloud services and tools
  • Proficiency in building scalable high availability analytics solutions,
  • In-depth knowledge in one or multiple programming languages (we currently use Python, Java),
  • Experience of building unit tests, integration tests, system tests and acceptance tests,
  • Experience in DevOps, using continuous integration and continuous development (we use Jenkins, Nexus, Git),
  • Experience of data analysis – profiling, investigating, interpreting and documenting data structures,
  • Attention to detail and the ability to follow standards while contributing to the evolution of standards themselves
  • Excellent teamworking skills
  • Excellent verbal and written communication skills
  • A critical thinking demeanour and the capacity to propose solutions, not just highlight problems

The skills below would be useful but are not crucial:

  • Visualisation technologies (e.g. Spotfire, Power BI),
  • Deeper experiences or certification with AWS/Cloud services such as Redshift, Aurora, EMR, Spark, SQS, lambda, S3, DMS
  • Vagrant, Docker, Kubernetes
  • Process tools (JIRA, Confluence and CI/CD tools e.g. Jenkins and Bamboo)
  • Willingness to share how you work with others

Why AstraZeneca?

At AstraZeneca when we see an opportunity for change, we seize it and make it happen, because any opportunity no matter how small, can be the start of something big.

So, what’s next? Are you already imagining yourself joining our team? Good, because we can’t wait to hear from you. Apply today!

Please let the company know that you found this position on this Job Board as a way to support us, so we can keep posting cool jobs.