Data Engineer with experience in Pyspark, AWS, Git, Databricks & SQL
Responsibilities
Data Ingestion from various sources to AWS
Building and maintaining Scalable Data Pipelines on AWS
Building API Integrations for Data Transfer
Scheduling/Automating jobs on AWS
ETL and Optimizing code for performance - Pyspark and AWS Glue
Schema Design/ Data Architecture
Demonstrate cross-functional collaboration skills and work with people at various levels across the organization.
Working Conditions
In an office environment or remote
No anticipated need for travel
No anticipated need for lifting heavy objects
Candidate Profile
Bachelor's or master's degree from an accredited four-year college in computer science, software engineering, information technology, or another quantitative field.
A minimum of 2+ year experience on AWS Glue and pyspark.
Demonstrated Experience in Terraform.
Advanced knowledge of AWS Glue and pyspark such as Spark Optimization Techniques and Pyspark APIs etc. and familiarity with Data Migration Services (should know data migration from Servers like SQL, Oracle, CSVs).
Comfortable in Kinesis.
Understanding of overall system in GIT i.e., Pull/Push /New branch/merge etc. commands.
Experience in databricks i.e., Notebook.
Highly innovative, flexible and self-directed.
Experience in communicating with users, other technical teams, and management to collect requirements.
Preferred Qualifications
Experience in AWS services (Data Migration Services & Kinesis) and understanding of core AWS services, uses, and basic AWS architecture best practices (Pyspark and AWS Glue).
Significant experience in the de-regulated energy industry or the power generation industry.
Knowledgeable of client terminology, concepts, and value chain.
Additional Knowledge, Skills And Abilities
2+ years of advanced knowledge and professional quantitative experience.
Advanced SQL coding, tuning and query optimization experience.
Experience with large-scale data analytics or data science.