Your Role and Responsibilities

As a Data Scientist you be part of our growing team of data scientists and experts. You will be responsible for expanding and optimizing our data models, prediction algorithms correlation algorithms as well as text analytics models. You will support our software developers, data engineers on building and enhancing models. You must be self- directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimizing or even re- designing our real time anomaly detection, correlation and forecasting models. You will be working on a Big Data Architecture. The solutions developed by you would be highly scalable leveraging that needs to be run in Kubernetes Container environment.

Some of the Solutions we work involve the following
  • Proficiency in at least one statistical tool language (R, Python) with a strong preference towards Python
  • Deep expertise in statistical, machine learning and deep learning techniques including but not limited to classification, Regression, Anomaly Detection, Multivariate Correlation, Forecasting, Optimization, Topic Modelling, Clustering, False Positive/False Negative Reduction, Imbalance Class Problems, Novelty Detection, Casual Inferences, Statistical Tests, Evaluation Methodologies, etc
  • Thorough understanding and experience in ML algorithms including but not limited to Decision Tree, Random Forest, Naive Bayes, Support Vector Machines, XGBoost, Logistic Regression, etc
  • Thorough understanding and experience in Neural Network, Deep Learning (CNN, RNN, LSTM, Auto encoder, HTM, etc)
  • Experience with Structured, Semi-structured and high dimensional data for data mining and information retrieval purposes
  • Strong knowledge and Capability to develop production ready solution using Python
  • Strong communication written and oral, structured problem solving and story-boarding skills
  • Ability to translate broad business strategies into clear, specific analytics led use-cases and design business deliverables and solutions
  • Real time anomaly detection solutions that proactively identify service impacting incidents and prevent system downtimes. This is done by leveraging an ensemble of Deep learning and LSTM / HTM models.
  • Natural Language Processing for entity extraction, classification, topic clusters and relationship extraction
  • Text Analytics in human generated service management tickets and correlation with ITIL service management event tickets for event noise reduction. Apply Natural Language Classification and RNN algorithms to automatically route tickets, bias detection, ticket de-duplication, cognitive ticket analysis and actionable insights etc
  • Log Analysis - Text mining, message clustering / templatization, Logs to metrics, anomaly detection, event annotation and sequencing
  • Learn Log Message Sequence for each mainframe batch job and Identify Anomalies during job runs using sequence mining techniques and provide early warning / alerts
  • Cloud Migration - Patterns-based discovery optimization: Identify potential business application boundaries using algorithmic approach from Cloudscape data.

Required Technical and Professional Expertise

  • Degree in Statistics, Mathematics, Computer Science or another quantitative field with 8-10 years of experience in manipulating data sets and building statistical models
  • Experience in using statistical computer languages (R, Python, SQL, etc.) to manipulate data and draw insights from large data sets
  • Experience in creating and using advanced machine learning algorithms and statistics such as regression, simulation, scenario analysis, modeling, clustering, decision trees, neural networks, etc.
  • Experience in visualizing or presenting data for stakeholders using Excel, PowerBI, Tableau etc.
  • Experience with distributed data or computing tools such as Hive, Spark, MySQL, etc
  • Experience creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modelling, clustering, decision trees, etc.
  • Strong knowledge of Java or Python and general software development skills (source code management, debugging, testing, deployment, etc.)
  • Experience querying databases and using statistical computer languages (R, Python, etc) to manipulate data and draw insights from large data sets.
  • Deep understanding of NLP techniques for text representation, semantic extraction techniques, data structures and modelling
  • Experience with application of Deep Learning to Natural Language Processing tasks
  • Experience with open-source NLP toolkits such as CoreNLP, NLTK, gensim, etc.
  • Experience with open-source ML/DL/math toolkits such as scikit-learn, WEKA, MLlib,, PyTorch, TensorFlow, NumPy, etc.

Preferred Technical and Professional Experience

  • Bachelor’s degree in Computer Science, Mathematics, Physics, Computational Linguistics, or related field
  • Experience with open-source distributed data processing frameworks, such as Spark
  • Experience working in a Linux environment
  • Experience working on a development team building product
  • Experience with presenting complex data science processes/information to non-data scientists
  • Experience with Information Retrieval and relevant tools such as Lucene, Elasticsearch, Solr
  • Experience with conducting projects from requirements generation, annotation, and modeling, through NLP output deliverables and management of internal/external clients
  • Prioritization skills; ability to manage ad-hoc requests in parallel with ongoing projects

Required Education

Bachelor's Degree

Preferred Education

Master's Degree



