The SickKids Enterprise Data and Analytics Office is building the team to support Expedition, SickKids's new Enterprise Data Platform powered by Cloudera. The Data Engineering Analyst will play a pivotal role in designing, developing, enhancing, maintaining, and testing data pipelines to provide clinicians, staff, researchers, and trainees with reliable access to enterprise data for advanced analytics and machine learning. This role is a key enabler of our vision to achieve a strong data foundation and advance the hospital's strategic objectives.
The Data Engineering Analyst will create enterprise data pipelines following industry standard practices and tools. This involves building, managing and optimizing data pipelines and then moving these data pipelines effectively into production for business data analysts, data scientists or any approved user who needs curated data for data and analytics use cases. The role also ensures compliance with data governance and data security requirements while creating, improving and operationalizing these integrated and reusable data pipelines. Success will be measured by faster data access for end users, integrated data reuse and improved time-to-solution for data and analytics initiatives.
To be successful, you possess both creative problem solving and collaboration skills and enjoy working with end users to understand their data requirements and can communicate effective data management practices. You will design solutions that take full advantage of the capabilities of Expedition. Additionally, you may collaborate with data scientists, data analysts and other data consumers and work on the models and algorithms developed by them in order to optimize them for data quality, security and governance and put them into production.
Here's What You Will Get To Do
- Build data pipelines from data sources or endpoints of acquisition to integration to consumption for specific use cases. This will include:
o Create, test, maintain and troubleshoot data ingestion flows with NiFi
o Learn data source APIs
o Develop Python scripts to transform data, interact with data platform APIs,
o Build data pipelines to deliver dynamically updated curated datasets to end users and processes (e.g. dashboards, ML models)
o Promote data pipelines from development to testing to production
o Document workflows and update standard playbooks for future data integration activities
o Create, test and maintain data access rule-sets
o Monitor schema changes.
- Drive automation through effective metadata management: use innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. This will include (but not be limited to):
o Learning and using modern data preparation, integration and AI-enabled metadata management tools and techniques facilitated through the capabilities of the Expedition Data Platform (powered by Cloudera).
o Performing intelligent sampling and caching.
o Maintain data source metadata and run Python scripts to update our data catalogue
- Collaborate across departments with varied stakeholders within the organization. This will include assisting data users with data access tools and techniques.
- Participate in ensuring compliance and governance of data pipelines onboarded to the Enterprise Data Platform to ensure that the data users and consumers use the data provisioned to them responsibly. Work with Data Stewards and participate in vetting and promoting content created in the business and by data scientists to the curated data catalog for governed reuse. This will include but not be limited to tagging data for privacy and sensitivity according to the enterprise data classification schema.
Here's What You'll Need
- At least four years of work experience in data management disciplines including data integration, modeling, optimization and data quality, and/or other areas directly relevant to data engineering responsibilities and tasks.
- At least three years of experience working in cross-functional teams and collaborating with business stakeholders in support of a departmental and/or multi- departmental data management and analytics initiative.
- Demonstrated experience with various Data Management architectures like Data Warehouse, Data Lake, Data Hub and the supporting processes like data integration, governance, metadata management.
- Demonstrated experience designing, building and managing data pipelines for data structures encompassing data transformation, data models, schemas, metadata and workload management.
- Demonstrated experience working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include ETL/ELT, data replication/CDC, message-oriented data movement, API design and access and upcoming data ingestion and integration technologies.
- Demonstrated experience with popular database programming languages including SQL, PL/SQL, Python, others for relational databases and certifications on upcoming NoSQL/Hadoop oriented databases like MongoDB, Cassandra, others for nonrelational databases.
- Demonstrated experience in working with big data tools and technologies including HIVE, Impala, Spark, AirFlow and others from an open source perspective or equivalent commercial tools.
- Experience automating pipeline development: understanding of DevOps capabilities like version control, automated builds, testing and release management capabilities using tools like Git, Jenkins, Puppet, Ansible.
- Basic experience working with popular data discovery, analytics and BI software tools like PowerBI for semantic-layer-based data discovery.
- A commitment to understanding and aiding in the pursuit of equity, diversity & inclusion.
Desired Skills
- Exposure to hybrid deployments: ability to work across multiple deployment environments including cloud, on-premises and hybrid, multiple operating systems and through containerization techniques such as Docker and Kubernetes.
- Familiarity with LDAP, Kerberos, R-Studio
- A passion for data and innovation, energetic self starter with a willingness to learn new skills and technologies
Employment Type: 1 Permanent Full-time & 1 Temporary Full-time (1.0 FTE, 35 hours/week)