Data Engineer

Location: Charlestown, MA

*** Mention DataYoshi when applying ***

Data Engineer

- (3161477)

Home Base seeks to hire a data engineer to start Apr 2021, who will support a variety of database management, ETL scripting, and data validation tasks that include but are not limited to: querying databases, restructuring data, cleaning and validating data, performing manual ETL tasks, automating ETL tasks using tools and custom scripting, full pipeline management/monitoring, improving systems and processes, and documenting data systems. The qualified candidate will be highly detail-oriented and have a strong interest in and aptitude for data management and engineering. Some specific focus areas would be determined based on the candidate's skills and interests.

The successful candidate must be highly organized, motivated, and able to thrive in a fast-paced team environment and must enjoy the challenge of a dynamic environment with evolving needs. It is extremely important that the candidate possess the ability to carefully keep track of multiple work streams.

Relevant activities include, but are not limited to the following:

  • Achieving an extremely detailed understanding of our current data ecosystem, including its structure, data meaning, history, flow/processing, and challenges
  • Utilizing, improving, and constructing ETL tools
    • Running current SQL, Python, PHP, and/or Tableau Prep ETL scripts
    • Using various monitoring and evaluation methods to validate that data flowing through these pipelines is accurate and troubleshooting/addressing issues when they are discovered
    • Improving and further integrating these scripts (ETL and validation) further into various data pipelines to achieve greater efficiency, reliability, and functionality.
    • Constructing new ETL tools as necessary/able, including a major rewrite of a family of old PHP pipelines in Python
  • Data Cleaning
    • Writing queries and scripts to identify data quality problems
    • Investigating the root cause of data quality problems
    • Working with appropriate team members to determine appropriate data remediation and process improvement plans
    • Developing queries and scripts as needed to repair data in bulk
    • Supporting a dashboard that automatically monitors for certain critical data quality problems in production, independent of ETL processes
  • Additional Responsibilities
    • Support the team as needed with data querying, processing, analysis and reporting for both regular and ad-hoc requests from clinical, executive, and external audiences
    • Research potential new data engineering solutions, analyze feasibility, and assist technical leadership in road-mapping the evolution of our data infrastructure
    • Create and maintain documentation across our data ecosystem

  • Background
    • Degree in Health Informatics, Computer Science, Statistics, Mathematics, Engineering, or a similar field
    • Familiarity with behavioral health clinical practice and/or research preferred
  • Technical
    • Procedural programming for data manipulation using Python, NumPy, and Pandas
    • PHP, Java, or other languages are a plus
    • Knowledge of relational database platforms and data modeling
    • Comfortable extracting data from and loading data into sources ranging from an Enterprise Data Warehouse to an Excel or text file, using built-in tools or custom-written ETL scripts
    • Knowledge of data aggregation and transformation processes (e.g. pivot, merge, union, hierarchical grouping, aggregation functions)
    • Above average SQL skills (e.g. familiar with subqueries, multiple joins, and grouping), specifically MySQL. SQL Server experience a plus
    • Comfortable with complex multi-stage, multi-technology ETL pipelines
    • Comfortable using APIs to transmit data in both an ad-hoc and automated manner
    • Familiar with concepts/tools of Data Quality Management as well as Data Governance practices
  • Professional
    • Ability to interpret and follow-through on data requirements and with strong attention to detail
    • Strength in independently validating and debugging code and analyses, including consulting documentation, Stack Exchange, etc.
    • Demonstrates personal initiative and time management skills, as well as the ability to work effectively and kindly as part of a team
    • Excellent verbal and written communication skills
    • Familiar with agile software development methodologies
    • Interest in identifying process improvement opportunities is a plus

LICENSES, CERTIFICATIONS, and/or REGISTRATIONS: Specify minimum credentials and clearly indicate if required or preferred.

  • Required: Undergraduate degree in Health Informatics, Computer Science, Statistics, Mathematics, Engineering, or a related subject.
  • Preferred: Graduate degree in one of the above.

Preferred coursework would include most of the following:

  • Intermediate Databases and SQL
  • Intermediate Programming (Procedural and/or OO)
  • Data Structures and Algorithms
  • Data Quality Management
  • Data Flow and Automation
  • Agile Project Management

Equivalent Experience – Equivalent time and aptitude achieved through work experience may substitute for some of the preferred courses listed above.

EXPERIENCE: Indicate the required and preferred (optional) amount and type of experience.

Preferred: 2+ years of experience in data management in a healthcare/clinical setting, however recent or anticipated college graduates will be considered.

SUPERVISORY RESPONSIBILITY (authority to hire, promote, or terminate): Indicate supervisory “scope” and list the number of employees supervised.


FISCAL RESPONSIBILITY: Indicate financial “scope” information, e.g. size of budget, volume, revenue, etc.


WORKING CONDITIONS: Describe the conditions in which the work is performed. Use this section to detail any physical requirements for the position (lifting, carrying, etc). Use this section to also detail any environmental conditions associated with the position (outdoor weather requirements, hazardous materials, etc.).

100% remote through Aug 31, 2021; up to 100% remote afterwards, TBD.

EEO Statement

Massachusetts General Hospital is an Equal Opportunity Employer. By embracing diverse skills, perspectives and ideas, we choose to lead. Applications from protected veterans and individuals with disabilities are strongly encouraged.
Primary Location MA-Charlestown-MGB OCC
Work Locations MGB OCC One Constitution Center Charlestown 02129
Job IT/Health IT/Informatics-Engineer
Organization Massachusetts General Hospital(MGH)
Schedule Full-time
Standard Hours 40
Shift Day Job
Employee Status Regular
Recruiting Department MGH Psychiatry
Job Posting Jul 6, 2021

*** Mention DataYoshi when applying ***

Offers you may like...

  • Withings HK Limited

    Production Data Engineer / 5-day work
    Sham Shui Po District, Kowloon
  • Bitex Limited

    Data Engineer (Artificial Intelligence / Natural L...
    New Territories
  • foodpanda Singapore

    Analytics Data Engineer, APAC (6 months contract)
  • Experis

    Data Engineer
  • element61

    Data engineer
    1020 Brussels