Ensure data accuracy, integrity, privacy, security, and compliance through rigorous quality control procedures.
Monitor and analyze the performance of data systems, implementing optimization strategies as needed. Develop ETL (extract, transform, load) processes to help extract and manipulate data from multiple sources.
Acquire, clean, and prepare relevant datasets for analysis and reporting.
Document processes, results, and methodologies to ensure clarity and facilitate maintenance.
Collaborate with data scientists, analysts, and other stakeholders to understand data needs and provide integrated solutions.
Work with version control systems, specifically Git, to manage codebase.
Ensure effective data monitoring and implement improvements as necessary.
Stay updated with the latest industry trends and technologies to ensure best practices are applied.
Develop and maintain data pipelines using technologies such as Snowflake, Python and workflow management tools (e.g., Airflow).
Utilize advanced SQL skills for complex queries
Requirements
Minimum qualifications
Bachelor's degree in Computer Science, Engineering, or related fields.
1-2 years of industry experience as a Data Engineer or in a similar role.
Proficient in object-oriented programming languages, particularly Python, and experience with related libraries such as Pandas, Dask, NumPy, and Matplotlib.
Advanced SQL skills and experience with relational databases and database design.
Experience working with cloud data warehouse solutions (e.g., Snowflake, Redshift, BigQuery, Azure).
Proficiency in using data pipelines and workflow management tools (e.g., Airflow).
Experience working with version control systems, specifically Git.
Proven experience in developing and maintaining ETL/ELT pipelines.
Expertise in pre-processing and cleaning large datasets for training models