Cloud Data Engineer
We are seeking a highly skilled Cloud Data Engineer with at least 5 years of experience in designing, developing, documenting, and integrating applications using Big Data platforms like Snowflake, Databricks, Hadoop, and Hive. The successful candidate will have expertise in deploying these pipelines to cloud infrastructure hosted in AWS or Azure.
- Gather requirements from business/user groups to analyze, design, development, and implement data pipelines according to customer requirements
- Process data from Azure/AWS data storage using Databricks and Snowflake
- Optimize table design and indexing for end-user ease of use as well as workload performance
- Work with various input file formats including delimited text files, log files, Parquet files, JSON files, XML files, Excel files, and others
- Develop automated ETL procedures to load data from various sources into our application’s data warehouse
- Ensure pipeline structure is standardized across different customers, each may have their own unique input data format
- Configure monitoring systems to detect failure and performance degradation of ETL pipelines
- Work with the DevOps team to design CI/CD pipelines to conduct ETL upgrades
- Deploy and leverage cloud infrastructure and services to assist in ETL pipeline definition and automation
- Understand data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Schema Modeling, Fact, and Dimension tables.
- Have strong knowledge of both SQL and No SQL databases
- Collaborate with business partners, operations, senior management, etc. on day-to-day operational support
- Work with high volumes of data with stringent performance requirements
- Use programming languages like Python to clean raw data before processing (e.g., removing newline characters/delimiters within fields)
- Define data quality and validation checks to preemptively detect potential issues
- Ensure ETL pipelines are HIPAA-compliant, run with minimal permissions, and securely manage any passwords and secrets used for authentication
- Document ETL pipeline logic, structure, and field lineage for review by both technical and non-technical audiences
Requirements
- Bachelor's degree in Computer Science or a related field
- 5+ years of experience in designing, developing, documenting, and integrating applications using Big Data platforms like Snowflake and Databricks
- Extensive experience working on both Azure and AWS, ideally using native ETL tooling (e.g., Azure Data Factory)
- Strong experience in cleaning, pipelining, and analyzing large data sets
- Adept in programming languages like R, Python, Java, and Scala
- Experience with git for version control
- Excellent problem-solving skills and ability to work independently and as part of a team.
- Strong communication and collaboration skills, with the ability to work with stakeholders from different backgrounds and levels of expertise.
Alivia Analytics is helping customers Achieve Healthcare Payment Integrity, Finally. By turning mountains of data into actionable answers, Alivia Analytics does the heavy lifting – delivering the accuracy, confidence and speed our customers need to solve their healthcare payment integrity challenges. Through the Alivia Analytics Healthcare Payment Integrity Suite TM we help private and public healthcare payers achieve payment integrity globally. In the US alone, up to 10% of every dollar spent is attributed to Fraud, Waste or Abuse that amounts to up to 370 Billion dollars lost annually. If your ambition is to grow your responsibilities and career while building world class analytic SaaS systems and fixing a huge problem for social good, please come and join us.