Our client is a publicly traded German online fashion and beauty retailer active across Europe.
Responsibilities
Develop, monitor, and operate the most used and most critical curated data pipeline at our client`s - Sales Order Data (incl. Post-order information, e.g. shipment, return, payment). This pipeline is processing hundreds of millions of records to provide high-quality datasets for analytical and machine learning use-cases
Consulting with analysts, data scientists, and product managers to build and continuously improve Single Source of Truth KPI for business steering such as the central Profit Contribution measurement (PC II)
Leverage and improve a cloud-based tech stack that includes AWS, Databricks, Kubernetes, Spark, Airflow, Python, and Scala
Requirements
Expertise in Apache Spark along with Spark streaming & Spark SQL
Good hands on experience with Databricks and delta-lake
Fluency in Scala programming language
Good understanding & hands-on experience with CI/CD
Ability to build Apache Airflow pipelines
Rich working experience with Github
Fluency working with AWS landscape
Upper-Intermediate level of English, both spoken and written (B2+)
Nice to have
Presto
Superset
Starburst
Exasol
We Offer
Competitive compensation depending on experience and skills
Variety of projects within one company
Being a part of a project following engineering excellence standards
Individual career path and professional growth opportunities