Senior Data Scientist - Machine Learning & AI in Genomics

Cambridge, UK

Competitive Salary & Benefits

What you’ll do

We have an opportunity for a talented Senior Data Scientist at the Centre for Genomics Research (CGR) to be part of ground-breaking research at the forefront of human genomics. You will be part of a dynamic team in CGR’s multidisciplinary genomics research environment comprising bioinformaticians, computational biologists, genome scientists, software engineers, postdoctoral researchers, disease area specialists. You will also work closely with specialists in translational science, drug discovery, pre-clinical modelling, and clinical development. In this role you will be responsible for designing and implementing novel machine learning and deep learning methods applied to genomics. This includes identifying research problems that could be addressed through structured or unstructured, complex genomic data and developing appropriate models and analytical solutions.

  • Design and implement novel machine learning and deep learning methods to apply to genomics research questions

  • Extract research and/or business value from highly unstructured genomic data and metadata, including the ~500,000 UK Biobank resource

  • Work with engineering and architecture to support large scale data preparation, the optimisation of analytics platforms and the industrialisation of proven analytics methods

  • Coordinate and execute analyses within AstraZeneca’s Centre for Genomics Research

  • Deliver novel insights into the biology of disease, validation of new targets for medicines and the improvement of selection of patients for clinical trials

  • Assess the scientific & technical integrity of algorithms and tools within the analysis pipeline

  • Maintaining a well-developed knowledge of genomic science and technical advances in the international community

  • Present novel results to top tier genetics and/or machine learning conferences and publish in high impact journals

  • Collaborate to apply genomic analysis with discovery and development teams

  • Communicate results to a variety of audiences, technical and non-technical

  • Ensuring own work, and work of team, is compliant with Good Laboratory Practice, Safety, Health and Environment standards and all other internal AstraZeneca standards and external regulations


  • PhD degree (or equivalent experience) in Computational Biology, Bioinformatics, Statistical Genetics, Biostatistics, or a related quantitative discipline

  • Solid experience in developing learning methodologies and building robust production machine learning systems

  • Experience in large-scale data analysis and applied statistics in the area of Genomics

  • Strong programming skills. Solid experience in one or more languages (Python, R, C++) and in open-source ML packages (e.g. scikit-learn, TensorFlow/Keras/PyTorch)

  • Strong knowledge of algorithms and data structures

  • Ability to communicate effectively with team members and non-experts, both verbally and through documentation

  • High level understanding or interest in the potential of genomics to impact drug discovery

  • Ability to prioritize and problem-solve

  • Excellent interpersonal skills and willingness to work within a team in a quickly evolving environment

  • Track record of peer-reviewed publications in high-level scientific journals

  • Passion for applying machine learning to the life sciences domain


  • Previous experience in a similar role

  • Experience in Unix environments and Bash scripting

  • Experience in high performance and/or cloud computing

  • Experience in analysing other omics data (e.g. transcriptomics, proteomics, metabolomics)

  • Familiarity working on genomics studies involving one of AstraZeneca’s core therapeutic areas

