You will design and implement tools and solutions for continuous data validation in ML pipelines for various Orange projects. You will recommend the best solutions, support and advise orange teams (data scientists, data analysts, BI teams, data owner, etc.).
Working under the direction of the Data Quality Product Owner, you will calibrate and qualify tools (commercial tools, internally developed tools, open source tools, cloud tools, etc.) and adapt them to the needs of customers or users.
You will work closely with other Data stakeholders and will contribute to strengthen the links between data quality and data catalogs, data lineage, DataOps pipelines, etc.
You will also contribute to explain the value of continuous data quality validation among all Orange teams.
You will investigate methods to improve and measure the performance of data quality tools and will be free to propose new approaches.
Your job involves a mix of activities.
Understand expectations in term of data quality
Analyze project's needs and characteristics
Propose solutions, implement, integrate, test and transfer the chosen solution
Produce regular reporting for customers, data quality product owner and management
Data Quality evangelist:
Communications on data quality stakes and solutions
Cookbook or demo
Learn, test adapt or create about new data quality solutions.
You have at least 2 years of experience in data engineering (building data pipelines, industrialization of data science applications)
An experience with data validation tools such as TensorFlow Data Validation, Amazon DeeQu or Great Expectations is a plus
You are convinced of the importance of data quality and you are highly motivated to address the corresponding tasks
You are results-oriented and you like to solve real-world problems
You have already worked in agile mode and have a taste for the consulting business
An experience working across organizational boundaries on a local and global basis is an asset
You have strong skills in Python, Java or Scala; knowledge of an ETL (Talend, Informatica...) is a plus
Excellent analytical skills with the ability to maintain attention to detail
Excellent communication skills, both written and oral
Great interpersonal and listening skills
Knowledge of ML algorithms and techniques (deep learning, tensorflow, keras, ...)
You have a good knowledge of modern development processes and tools (DevOps, Git, CI/CD...)
You have a basic knowledge of data architecture and cloud environment (GCP for instance)