We are looking for a data engineer with experience in building data pipelines to a big data cluster and deployment of Data Science engines working with the data on this cluster.
Currently our landscape consists of a Hadoop cluster where we are gathering data from a lot of different sources into our data lake. This data is then pushed to a SQL server database where we can connect our dashboards to and build data science engines on. The data is extracted using API calls and/or downloads from FTP servers. Techniques we use in this are
- Scala
- Spark
- Hadoop file system
- Python
- Linux commands
- SQL
- Google BigQuery
- Git
Preferred
- SAP systems
- BW development
RequirementsWe are looking for a data engineer with experience in building data pipelines to a big data cluster and deployment of Data Science engines working with the data on this cluster.
Currently our landscape consists of a Hadoop cluster where we are gathering data from a lot of different sources into our data lake. This data is then pushed to a SQL server database where we can connect our dashboards to and build data science engines on. The data is extracted using API calls and/or downloads from FTP servers. Techniques we use in this are
- Scala
- Spark
- Hadoop file system
- Python
- Linux commands
- SQL
- Google BigQuery
- Git
Preferred
- SAP systems
- BW development