PriceHubble

SENIOR DATA SCIENTIST - DATA INTELLIGENCE TEAM

Machine Learning Modeling

October 1, 2021

Apply Now

Paris (75), France

October 1, 2021

Apply Now

Job description

JOB DESCRIPTION

Data is at the core of PriceHubble. We process a wide variety of data from multiple sources. As a data scientist in the data-intelligence team, you will have three main missions:

First, to augment the data we have via machine learning prediction.
Second, to develop techniques to measure, assert, and improve the quality of the data we have.
Third, to develop matching algorithms for linking data from heterogeneous sources.

As a senior data scientist, you are highly motivated by the following questions:

Before doing standard machine learning, how do I build a strong labeled data set from scratch?
Garbage in = garbage out; then how do I measure the quality of labels in a data set? How do I improve upon this when I have very few labels to start with?
How can I go from no labels to the point where state of the art Machine-Learning can finally be leveraged?
How to plan research projects spanning 3 months to 1 year in a way that structurally mitigates risk?
What should be the next steps in a research project? Where should we focus research efforts? The models, the labels/training data, feature-engineering, post-processing, or elsewhere?

These questions are, in our opinion, the new frontier in data science. You will be joining a team that specializes in this topic, with, amongst other, advanced experience in crowd-sourcing, matching problems, ensembles modeling, and statistical estimation. Our technologies and tools are just getting started; feeling excited about it? Want to be part of the adventure? Hop in!

Responsibilities

Mentor more junior team members
Define roadmap & approaches for research projects
Actively mitigate risks in Machine Learning projects, by attacking high risk items first and making sure projects fail fast if likely to hit a structural blocker
Apply machine learning methods to augment data-sets
Develop and improve models for cross linking heterogeneous data sources together
Analyse and detect problems in our estimators
Correct blind spots in our data-labelling
Deploy, validate, and fine tune crowd-sourcing jobs for acquiring labels

Requirements

MSc or PhD in Computer Science, Applied Mathematics or related fields; with a strong experience in machine learning and/or data science.
3-5 years experience in a data-science, research (incl. PhD), or quantitative role
In-depth understanding of basic data structures and algorithms.
Strong analytical skills with the ability to collect, organise, and analyse significant amounts of data with attention to detail and accuracy.
Strong programming experience with Python, and ability to write quality production code.
Experience with crowd-sourcing, active-learning, semi-supervised learning, ensemble-modeling or matching problems is a plus.
Experience with ETL and data processing tools we’re using is an advantage (pandas, airflow, PySpark).
Experience with standard ML frameworks is also a plus (sklearn, tensorflow, pytorch,...)
Comfortable working in English; you have a great read, good spoken command of it.