Key Responsibilities
Data Ingestion: Design and implement data ingestion pipelines using Databricks and PySpark, with a focus on Autoloader for efficient data processing.
Nested JSON Handling: Develop and maintain processes for handling complex nested JSON files, ensuring data integrity and accessibility.
API Integration: Integrate and manage data from various APIs, ensuring seamless data flow and consistency.
Data Modeling: Create and optimize data models to support analytics and reporting needs.
Performance Optimization: Optimize data processing and storage solutions for performance and cost-efficiency.
Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver effective solutions.
Data Quality: Ensure the accuracy, integrity, and security of data throughout the data lifecycle.
Qualifications
Technical Expertise: Proficiency in Databricks, PySpark, and SQL. Strong experience with Autoloader and handling nested JSON files.
API Experience: Demonstrated experience in integrating and managing data from various APIs.
Problem-Solving Skills: Strong analytical and problem-solving abilities.
Communication Skills: Excellent communication skills to collaborate with cross-functional teams.
Experience: Previous experience 3- 5 years in data engineering, data integration, and data modeling.
Education: A degree in Computer Science, Engineering, or a related field is preferred.
Preferred Qualifications
Experience with cloud platforms such as AWS, Azure, or Google Cloud.
Familiarity with data warehousing concepts and tools.
Knowledge of data governance and security best practices.