WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.
Description: PulsePoint’s award winning platforms accelerate data and programmatic technology to deliver contextually relevant and personalized health information. We help brands and agencies better understand audience engagement and are revolutionizing health decisions through real time data. As a member of our Data Science Engineering team, the Machine Learning Engineer, Natural Language Processing, will focus on the following:
Improving page contextualize technology: work with healthcare topics detection algorithms, keywords/phrases extraction, general and aspect-based sentiment analysis.
Experience 2+ years
In addition to the above, they will work with the greater Data Science/Engineering teams on:
- Improving existing or developing new traffic segmentation algorithms and estimations of bid landscapes within each segment;
- Optimize real-time bidding strategies and auction mechanics to efficiently spend ad budgets delivering campaign targets given various constraints;
- Supporting and enhancing the existing work on health user profiling, prediction, and targeting tools;
- Contributing on projects relating to patient/physician identity for cross-device tracking, profiling and targeting;
- Supporting existing codebase for data integration and production support for our core models.
These are the things that we'll be looking for from a candidate:
- 5+ years of NLP or relevant contextualization experience. (Note-Please provide detail on this important requirement in your cover letter)
- Advanced knowledge of Python using numpy & pandas
- Being able to optimize and speed up code.
In addition to the above, you'll need to have strong knowledge in the following areas (along with a breakdown of the areas we'd like for you to have exposure with):
- Algorithms and Data Structures-Sorting, search tree, binary heap, trie; Time & mem complexity of algorithms.
- Probability & Statistics-Markov processes and its stationary distributions; Stochastic matrix and properties of its eigenvalues; Bayesian inference and conjugate distributions; Two-sample hypothesis testing.
- ML & DS-Dimensionality reduction; Geometry of PCA and SVD; Geometry of L1 and L2 regularization (Why does L1 result in feature selection?); Decision Trees; Collaborative filtering; Thompson sampling; MCMC; Boosting, (Biases in Boosted DT); Bagging
- Neural Networks-Embeddings; Encoders; Drop-out; CNN, RNN; Internal covariate shift