GoKwik

Data Engineer - II

Kafka Python ETL

February 9, 2024

Apply Now

Bengaluru, Karnataka, India, India

February 9, 2024

Apply Now

Job description

We are looking for expertise in Realtime data warehousing and building large-scale Batch & Streaming data processing systems by using the latest data stack.

The Data Engineer takes responsibility for building and running data pipelines, designing our local data warehouse/lake and data frameworks, and catering to different data presentation techniques.

Responsibilities

Coder at heart - Someone who can define, write, deploy, and maintain codebase independently.
Define, Execute, and Manage large-scale ETL processes to build Datalake, Data warehouse, and support development.
Strong Knowledge of building real-time data Analytics Pipelines using Kafka, Druid, and Airflow.
Should have strong problem-solving capabilities and the ability to quickly propose feasible solutions and effectively communicate strategy and risk mitigation approaches to leadership.
Build ETL pipelines in Spark, and Presto that process transaction and account-level data and standardize data fields across various data sources.
Experience creating/supporting production software/systems and a proven track record of identifying and resolving performance bottlenecks for production systems.
Exposure to deploying large data pipelines to scale ML/AI models built by the data science teams and experience with the development of models is a strong plus.
Build and maintain high-performing ETL processes, including data quality and testing aligned across technology, internal reporting, and other functional teams.
Create data dictionaries, set/monitor data validation alerts, and execute periodic jobs like performance dashboards, and predictive models scoring for client's deliverables.
Define and build technical/data documentation and experience with code version control systems (e. g. git).
Ensure data accuracy, integrity, and consistency.

Requirements

Strong understanding of development and implementation aspects of data pipelines for ML/AI, especially on billion-scale datasets.
Ability to take small-scale developed models as input and implement with requisite configuration and customization while maintaining model performance.
Strong written, verbal, and interpersonal skills are needed to effectively communicate technical insights and recommendations with business customers and the leadership team.
Exposure to model management and governance practices.
Ability to make decisions around model drift to monitor and refine models continuously.
4+ yrs. work experience with a Bachelor's Degree or 3+ years of work experience with a master's or Advanced Degree in an analytical field such as computer science, statistics, finance, economics, or a relevant area.
Strong experience in creating large-scale data engineering frameworks/pipelines, data-based decision-making, and quantitative analysis.
Strong Experience with Batch & real-time data management
Advanced experience in writing and optimizing efficient SQL queries and handling Large Data Sets in Big-Data Environments.
Experience with Shell and Python scripting and exposure to Scheduling tools like Apache Airflow.
Advanced knowledge of Big Data ecosystems and associated technologies, (e. g. Apache Spark, EMR, EKS, Redshift, Athena/Presto, Kafka, Airflow, etc. ) is a must.
Languages preferred - Python, Scala.

Additional/Preferred Qualifications

Experience with building Datalake/Lakehouse from scratch will be preferred.
Experience with maintaining & optimizing infra for BI tools like Superset/Metabase is a plus.
Experience with developing & maintaining backend REST APIs for Data platforms using Java/Golang is a strong plus.
Experience with Semantic Layer around Data platform is a strong plus.

Some Important Traits - We look out for a Person in this role