Providing the organization's data consumers with high-quality data sets by data curation, consolidation, and manipulation from a wide variety of large-scale (terabyte and growing) sources.
Building first-class data products and ETL processes that interact with terabytes of data on leading platforms such as Snowflake and Big Query.
Developing and improving Foundational datasets by creating efficient and scalable data models to be used across the organization.
Working with our Data Science, Analytics, CRM, and Machine Learning teams.
Responsible for the data pipelines' SLA and dependency management.
Writing technical documentation for data solutions and presenting at design reviews.
Solving data pipeline failure events and implementing anomaly detection.
Requirements
Experience with technologies and tooling associated with relational databases (PostgreSQL or equivalent) and data warehouses (Snowflake, Big Query, Hadoop/Hive, or equivalent).
Experience with writing and optimizing SQL.
Experience with data modeling and ETL/ELT design, including defining SLAs, performance measurements, tuning, and monitoring.
Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets.
Experience with programming languages such as Python, Java, and/or Scala
Knowledge of cloud data warehouse concepts
Required Skills: Advanced SQL
Snowflake/BigQuery/Hadoop/Hive or similar tools
Data modeling and ETL/ELT design
Programming experience in any language i. e. Python, Java, Scala, etc. is good to have.