- Minimum 6 years of relevant experience, specifically strong experience in SQL, Python, and PySpark for building and optimizing data workflows
- In-depth knowledge of Azure Databricks for data engineering, including proficiency in Apache Spark, PySpark, and Delta Lake.
- Familiarity with Databricks components such as Workspace, Run-time, Clusters, workflows, DLT, functions, hive Metastore, SQL Warehouse, Delta sharing, and Unity Catalog.
- Knowledge of the insurance industry is preferred.
- Experience in ETL/ELT and data integration with an understanding of enterprise data models like CDM and departmental data marts.
- Proficiency in using Azure Data Factory (ADF) to build complex data pipelines and integrate data from various sources.
- Familiarity with Azure Purview, Azure Key Vault, Azure Active Directory, and RBAC for managing security and compliance in data platforms.
Desired Skills and Experience
Essential:
6+ years of experience in Data Engineering using SQL and Python.
Strong understanding of data lake and data warehouse design principles.
Hands-on experience with cloud-based ETL services (e.g., AWS, EMR, Airflow).
Familiarity with MLOps frameworks (e.g., AWS SageMaker).
Experience with distributed computing systems (e.g., Spark, Hive, Hadoop).
Proficient with databases such as Postgres, MySQL, and Oracle.
Strong English communication skills, both written and spoken.
Desirable:
Experience with other cloud platforms (e.g., GCP, Azure).
Understanding of Machine Learning and Deep Learning.
Proficient in real-time data streaming technologies (e.g., Kafka, Spark Streaming).