SUMMARY: The Data Engineer plays a pivotal role in operationalizing the most urgent data and analytics initiatives for The Banks digital business initiatives by building, managing and optimizing data pipelines and then moving these data pipelines effectively into production for key data and analytics consumers.
ESSENTIAL DUTIES AND RESPONSIBILITIES include the following. Other duties and special projects may be assigned.
- Design, create and maintain data pipelines will be the primary responsibility of the data engineer.
- Drive automation through effective metadata management.
- Assist with renovating the data management infrastructure to drive automation in data integration and management.
- Utilize modern data preparation, integration and AI-enabled metadata management tools and techniques.
- Track data consumption patterns.
- Perform intelligent sampling and caching.
- Monitor schema changes.
- Recommend and automate integration flows.
SENIOR LEVEL RESPONSIBILITIES
- Work with data science teams and with business (data) analysts to refine their data requirements for various data and analytics initiatives.
- Propose appropriate (and innovative) data ingestion, preparation, integration and operationalization techniques.
- Train counterparts such as data scientists, data analysts, LOB users or any data consumers in data pipelining and preparation techniques.
- Ensure that data users and consumers use the data provisioned to them responsibly through data governance and compliance initiatives. Participate in vetting and promoting content created in the business and by data scientists to the curated data catalog for governed reuse.
- Become a data and analytics evangelist by promoting the available data and analytics capabilities and expertise to business unit leaders and educating them in leveraging these capabilities in achieving their business goals.
EDUCATION and/or EXPERIENCE:
A bachelor's degree in computer science, statistics, applied mathematics, data management, information systems, information science or a related quantitative field is required. An advanced degree in computer science (MS), statistics, applied mathematics (Ph.D.), information science (MIS), data management, information systems, information science (post-graduation diploma or related) or a related quantitative field or equivalent work experience is preferred.
The ideal candidate will have a combination of IT skills, data governance skills, analytics skills and banking domain knowledge with a technical or computer science degree
At least four (Level 1) or six years (Level 2) or more of work experience in data management disciplines including data integration, modeling, optimization and data quality, and/or other areas directly relevant to data engineering responsibilities and tasks.
At least three years of experience working in cross-functional teams and collaborating with business stakeholders in the banking business domain, in support of a departmental and/or multi-departmental data management and analytics initiative.
Level I must also possess the majority of the following; Level II must possess all of the following:
- Strong experience with advanced analytics tools for object-oriented/object function scripting using languages such as R, Python, Java, and Scala
- Strong ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, schemas, metadata and workload management. The ability to work with both IT and business in integrating analytics and data science output into business processes and workflows.
- Strong experience with popular database programming languages including SQL and PL/SQL for relational databases and certifications on upcoming NoSQL/Hadoop oriented databases like MongoDB and Cassandra for nonrelational databases.
- Strong experience in working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include ETL/ELT, data replication/CDC, message-oriented data movement, API design and access and upcoming data ingestion and integration technologies such as stream data integration, CEP and data virtualization.
- Strong experience in working with SQL on Hadoop tools and technologies including HIVE, Impala, Presto and others from an open source perspective and Hortonworks Data Flow (HDF), Dremio, Informatica, Talend among others from a commercial vendor perspective.
- Strong experience in working with and optimizing existing ETL processes and data integration and data preparation flows and helping to move them in production.
- Strong experience in working with both open-source and commercial message queuing technologies (Kafka, JMS, Azure Service Bus, Amazon Simple queuing Service), stream data integration technologies such as Apache Nifi, Apache Beam, Apache Kafka Streams, Amazon Kinesis, others and stream analytics technologies (Apache Kafka, KSQL, Apache Spark).
- Basic experience working with popular data discovery, analytics and BI software tools like Tableau, and OBI for semantic-layer-based data discovery.
- Strong experience in working with data science teams in refining and optimizing data science and machine learning models and algorithms.
- Basic experience in working with data governance teams and specifically business data stewards and the CISO in moving data pipelines into production with appropriate data quality, governance and security standards and certification.
- Demonstrated ability to work across multiple deployment environments including cloud, on-premises and hybrid, multiple operating systems and through containerization techniques such as Docker, Kubernetes, AWS Elastic Container Service and others.
- Proficiency in agile methodologies and the capability of applying DevOps and increasingly DataOps principles to data pipelines to improve the communication, integration, reuse and automation of data flows between data managers and consumers across an organization
- Deep domain knowledge or previous experience working in the banking business would be a plus.