As a data engineer you leverage your passion for data by supporting Rabobank’s data driven ambitions to fruition. Supporting all business initiatives with a need for data (structured and unstructured, high volume and/or streaming), and especially KYC. We see collaboration as the best way to deliver strong results. We use Scrum as an agile way of working in our Tribe. We bring together talented people to work on our data systems, including data warehousing, Hadoop and Azure cloud.
Your role You will be organizing the data operations for analytics projects that give Rabobank the leading edge on banking.
You get excited by building and deploying big data pipelines in the cloud. In the Squad you work on Azure, where you use cloud native components and value the separation between storage and computing.
Data frames are not a new concept to you and you write code in Python or PySpark for your ETL needs. Your deployments are automated and you are able to quickly prototype your solutions.
You will operate within a complex web of data sources and advise on and execute data flows that lead to new insights;
You will be a specialist in data ingestion, building robust automated data ingestion jobs that run flawlessly to fetch data from the data lake and other sources inside and outside the bank;
You will be working on different platforms and with different tools, orchestrating data pipelines and connecting systems and data environments on-premise and in the cloud;
Your keen sense for security will make sure that the data we are working with is protected, both in transfer and in use, and that datasets are anonymized or pseudonymized to safeguard important data;
You will actively participate in knowledge sharing, meetups, code reviews and deep dives to develop the ambitious team that you are part of;
You will be working in direct contact with our multi-disciplinary project teams with roles such as data scientists, business consultants and data engineers.
Your experience and competencies
You have experience in using the concepts/components in a production environment:
Azure Data Factory
Azure Data Lake Storage
Spark using Python (or Scala)
Scheduling and triggering using Airflow
Build and deploy with Azure DevOps using yaml and bash
Set up data quality and monitoring
Develop unit tests (TDD) using Python
Data modelling (like data vault, dimensional modelling) and SQL
Good knowledge of OO design patterns.
The competences we would like you to have:
Eager to learn new techniques
Enjoy to teach and share knowledge
Good communication skills
Strong collaboration with other roles in the Scrum team
Drive to continuously improve
Take ownership of the build, run and change process