Job Description
Job Title: Data Scientist
Type: Direct Hire
Work Location: Remote position, with occasional travel required.
Job Overview
We are looking for Data Scientists who can develop state of the art, scalable, and self-learning system for information extraction from unstructured content (text, image, speech and video). Major duty is to develop AI/ NLP solutions for business problems using in-production (feedback) data and state of the art tools and techniques
Essential Job Responsibilities
- Design, develop and implement solutions for a wide range of NLP use cases involving classification, extraction and search on unstructured text data
- Create and maintain state of the art scalable NLP solutions in Python/ Java/ Scala for multiple business problems. This involves:
- Choosing most appropriate NLP technique(s) based on business needs and available data
- Performing data exploration and innovative feature engineering
- Training and tuning a variety of NLP models / solutions which include regular expressions, traditional NLP models as well as SOTA transformer based models
- Augmenting models by integrating domain specific ontologies and/or external databases
- Reporting and Monitoring the solution outcome
- Work experience with document-oriented databases such as MongoDB
- Collaborate with ML engineering team to deploy NLP solutions in production - both on premise as well as cloud deployment
- Interact with clients and internal business teams to perform solution feasibility as well as design and develop solutions
- Open to working across different domains – Insurance, Healthcare and Financial Services etc
Required Skills & Experience:
- 1-6 years of relevant experience
- Experience (including graduate school) on training machine learning models, applying and developing text mining and NLP techniques
- Exposure to OCR and computer vision
- Experience (including graduate school) with Natural Language Processing techniques is required
- Hands on experience with Natural Language Processing tools such as Stanford CORE-NLP, NLTK, spaCy, Gensim, Textblob etc.
- Experience/ Familiarity with document clustering in supervised un un-supervised scenarios
- Expertise in at least two of the state of the art techniques in NLP like BERT, GPT, XL Net etc.
- Applied experience of machine learning algorithms using Python
- Organized, self-motivated, disciplined and detail oriented
- Production level coding experience in Python is required
- Ability to read recent ML research papers and adapt those models to solve real-world problems
- Experience with any deep learning framework, including Tensorflow, Caffe, MxNet, Torch, Theano
Big Bonus Points If You Have
- Hands on experience with using cloud technologies on AWS/ Microsoft Azure
- Experience with optimization on GPUs (a plus)
- Experience in extracting content from documents
- Engineering graduate from Top Tier Universities
Job Requisition #38697
A reasonable estimate of the Base Salary for this role is $110,000 – 115,000 per year plus Commission (uncapped). The disclosed pay range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. The compensation decisions are dependent on the facts and circumstances of each case, such as skills and experience levels.