Data Engineer

Location: Remote

*** Mention DataYoshi when applying ***

Position Summary

The Data Engineer will design, build and deliver on our next generation eDiscovery intelligence products. Our products range from of AI enabled TAS (technology assisted solutions), eDiscovery SaaS managed services, and IaaS. We leverage a modern cloud stack and are looking for cloud practitioners and innovators to help us continually evolve. You will be building highly scalable, reliable and fault tolerant data pipelines and engaging with our data science team daily.

Primary Responsibilities

  • Design, build & manage the advanced analytics platform preprocessing, validation and configuration to support data science teams.
  • Collaborate with senior management, product management, and other engineers in the development of optimal data products.
  • Build and operate stable, scalable and highly performant data pipelines that cleanse, structure and integrate disparate datasets into a readable and accessible format for end user analyses and targeting.
  • Develop tools to monitor, debug, and analyze data pipelines.
  • Design and implement data schemas and models to scale and portable.
  • Provide technical recommendations regarding buy vs. build decisions for different components of the data analytics infrastructure.
  • Perform other related duties as assigned.

Knowledge, Skills, and Behaviors

  • ETL Processing experience using Python, C#, or Java.
  • Building scalable cloud environments handling petabyte data and operationalizing clusters with hundreds of compute notes.
  • Building real-time data collection infrastructure including client SDKs will be a huge plus.
  • Experience in operationalizing Machine Learning workflows to business requirements.
  • Building real-time data collection infrastructure including client SDKs will be a huge plus.
  • Open source such as Hadoop, Spark, Kafka, and Yarn.
  • Containers such as Kubernetes.
  • Experience in working with Data Scientists to operationalize machine-learning models
  • Proficiency with agile development methodologies shipping features every two weeks.


  • Degree in computer science, computer engineering with 5+ years of experience in data related field
  • Experience with implementing big data workflows in cloud native technologies (Azure is preferred).
  • Expertise working with both structured and unstructured data in a Big Data platform setting with standard toolset.
  • Experience with data streaming such as Apache Kafka, AWS kinesis, Spark Streaming, or similar tools.
  • ETL processing experience using modern processes.
  • Knowledge of various data science techniques and experience implementing models developed with these techniques into production environment.
  • Knowledge and experience working with, and relational databases like MS SQL or Hyperscale SQL.
  • Azure or AWS big data or architecture certification preferred.

Work Environment and Physical Demands

  • Duties are performed in a typical office environment while sitting at a desk or computer table. Duties require the ability to use a computer, communicate over the telephone, and read printed material.
  • Duties may require being on call periodically and working outside normal working hours (evenings and weekends).
  • Duties may require the ability to travel via automobile or airplane, approximately 5% of the time will be spent traveling.
  • Duties may require the ability to lift up to 10 lbs.

*** Mention DataYoshi when applying ***

Offers you may like...

  • The Upside Travel Company, LLC

    Senior Data Engineer
  • PPL Corporation

    Senior Data Engineer- Remote
    Allentown, PA
  • Artemis Consulting Inc

    Senior AWS Cloud Data Engineer
    Washington, DC 20001
  • Illuminate Education

    Data Engineer [remote]
    Minneapolis, MN 55402
  • Atomic

    Data Engineer