Role Overview:
As a Data Engineer, you will design, develop, and optimize data pipelines and infrastructure on GCP to enable advanced analytics and reporting solutions. You will work closely with business stakeholders to deliver robust, scalable solutions that support business intelligence and machine learning initiatives.
This hands-on role requires a deep understanding of BigQuery, data engineering best practices, and the ability to translate business requirements into technical solutions. If you are passionate about working with big data and cloud technologies, we would love to hear from you!
Key Responsibilities:
- Data Pipeline Development: Design and build ETL/ELT data pipelines using BigQuery and other GCP services to ingest, process, and transform large datasets from multiple sources.
- Data Modeling & Architecture: Develop and optimize data models and schemas to support analytics, reporting, and machine learning requirements.
- Performance Optimization: Implement best practices for performance tuning, partitioning, and clustering to optimize data queries and reduce costs in BigQuery.
- Data Integration & Transformation: Collaborate with data scientists and analysts to design data solutions that integrate seamlessly with BI tools, machine learning models, and third-party applications.
- Data Quality & Governance: Establish and enforce data quality standards, data governance frameworks, and security policies for data storage and access on GCP.
- Automation & Monitoring: Automate workflows using Cloud Composer, Cloud Functions, or other orchestration tools to ensure reliable and scalable data pipelines.
- Documentation & Knowledge Sharing: Create comprehensive documentation for data pipelines, workflows, and processes. Share best practices and mentor junior data engineers.
Required Qualifications:
- 3+ years of experience working as a Data Engineer, with a focus on GCP and BigQuery.
- Strong proficiency in SQL and experience in developing complex queries, stored procedures, and views in BigQuery.
- Hands-on experience with GCP services such as Cloud Storage, Dataflow, Cloud Composer, and Cloud Functions.
- Understanding of data warehousing concepts, dimensional modeling, and building data marts.
- Experience with ETL/ELT tools like Apache Beam, Dataflow, or dbt.
- Familiarity with scripting languages like Bash, Python or JavaScript for automation and integration.
- Proven ability to work with large datasets and cost-effectively optimize query performance.
- Excellent communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams.
- GCP Professional Data Engineer Certification is a plus.
Preferred Skills:
- Experience with machine learning on GCP using Vertex AI or AI Platform.
- Knowledge of data governance and security best practices in a cloud environment.
- Experience working with real-time streaming data and tools like Pub/Sub or Kafka.