Data is at the core of PriceHubble. We process a wide variety of data from multiple sources. As a data scientist in the data-intelligence team, you will have three main missions:
- First, to augment the data we have via machine learning prediction.
- Second, to develop techniques to measure, assert, and improve the quality of the data we have.
- Third, to develop matching algorithms for linking data from heterogeneous sources.
As a senior data scientist, you are highly motivated by the following questions:
- Before doing standard machine learning, how do I build a strong labeled data set from scratch?
- Garbage in = garbage out; then how do I measure the quality of labels in a data set? How do I improve upon this when I have very few labels to start with?
- How can I go from no labels to the point where state of the art Machine-Learning can finally be leveraged?
- How to plan research projects spanning 3 months to 1 year in a way that structurally mitigates risk?
- What should be the next steps in a research project? Where should we focus research efforts? The models, the labels/training data, feature-engineering, post-processing, or elsewhere?
These questions are, in our opinion, the new frontier in data science. You will be joining a team that specializes in this topic, with, amongst other, advanced experience in crowd-sourcing, matching problems, ensembles modeling, and statistical estimation. Our technologies and tools are just getting started; feeling excited about it? Want to be part of the adventure? Hop in!
Responsibilities
- Mentor more junior team members
- Define roadmap & approaches for research projects
- Actively mitigate risks in Machine Learning projects, by attacking high risk items first and making sure projects fail fast if likely to hit a structural blocker
- Apply machine learning methods to augment data-sets
- Develop and improve models for cross linking heterogeneous data sources together
- Analyse and detect problems in our estimators
- Correct blind spots in our data-labelling
- Deploy, validate, and fine tune crowd-sourcing jobs for acquiring labels
Requirements
- MSc or PhD in Computer Science, Applied Mathematics or related fields; with a strong experience in machine learning and/or data science.
- 3-5 years experience in a data-science, research (incl. PhD), or quantitative role
- In-depth understanding of basic data structures and algorithms.
- Strong analytical skills with the ability to collect, organise, and analyse significant amounts of data with attention to detail and accuracy.
- Strong programming experience with Python, and ability to write quality production code.
- Experience with crowd-sourcing, active-learning, semi-supervised learning, ensemble-modeling or matching problems is a plus.
- Experience with ETL and data processing tools we’re using is an advantage (pandas, airflow, PySpark).
- Experience with standard ML frameworks is also a plus (sklearn, tensorflow, pytorch,...)
- Comfortable working in English; you have a great read, good spoken command of it.
- We are interested in every qualified candidate who is eligible to work in the European Union but we are not able to sponsor visas.
Benefits
On top of joining a team of ambitious, qualified people you may also enjoy our benefits:
Flexible work hours
Competitive salary
Casual dress code
L&D program
Well-located offices
Free snacks, fruits, coffee, beers, sodas