Cloud Data Engineer, St. Louis, MO
Responsibilities
- Develop optimal extract, load and transform pipelines from a wide variety of data sources using Python and Spark
- Utilize GCP ‘big data’ technologies like Cloud Dataproc and BigQuery
- Schedule and orchestrate solutions using GCP Cloud Composer
- Create real-time data streaming pipelines using GCP App Engine and Dataflow
- Perform data exploration in support of new projects and to troubleshoot issues
- Work closely with data analysts to facilitate data acquisition and enable efficient analysis
- Adapt existing batch and real time data pipelines to accommodate new business requirements
- Develop and deploy code with exception handling, logging and monitoring
- Ensure data quality throughout the entire data processing pipeline
- Perform unit testing, integration testing, and assist with user acceptance testing
- Be proactive in clarifying requirements when needed to ensure accurate results
EXPERIENCE
- Bachelor’s Degree in computer science or equivalent experience required
- 3+ years hands on experience with Python
- 3+ years hands on SQL and relational database experience
- 2+ years hands on experience in a public cloud development environment (GCP preferred)
- Experience setting standards for design, documentation, testing, and code quality
- Proficient understanding of distributed computing principles
- Experience with messaging systems, such as Kafka and GCP PubSub
- Ability to conduct code reviews and provide constructive feedback to peer team members
- Excellent Communication Skills