Job description

The Data Reliability Engineering team for Disney Streaming Services (DSS) is responsible for maintaining and improving the reliability of DSS’ big data platform, which processes hundreds of terabytes of data and billions of events daily. We are looking for a Data Reliability Engineer to help us in the ongoing mission of delivering an outstanding service to our users and make DSS more data driven. Additionally, you will work closely with our partner teams on our incident management process including post-mortem, root cause analysis and preventing incident recurrence. You will be making an outsized impact in an organization that values data as its top priority. Are you passionate about reliability engineering, automation and software delivery excellence? If that is the case, we believe this is the role for you.


Responsibilities :


  • You will work closely with your counterparts on the Data Reliability Engineering team and our partner teams to improve automation, resiliency and maintainability of our data systems. 20%
  • You will build solutions to continually improve our software release and change management process using industry best practices to help DSS maintain legal compliance. 20%
  • You will help to design and build systems that improve the reliability, resiliency and maintainability of our big data systems and products. 20%
  • You will help to build out observability and intelligent monitoring of data pipelines and infrastructure to achieve early and automated anomaly detection and alerting. 10%
  • You will be on-call to respond to system failures, alerts and /or other issues that prevent critical systems from running. 20%
  • Plan service capacity and testing automation, design business continuity and disaster recovery plans and processes and work with the engineering team on implementation.10%


Basic Qualifications:


  • 2+ experience working on Linux environment, and proficient with cloud environment (AWS)
  • Experience coding in one or more of the following programming language: Python, Java, or Scala
  • Highly proficient in SQL
  • 2+ years of hands-on experience in Reliability Engineering for high-performant, scalable and distributed data systems with a focus on automation


Additional Qualifications:


  • Detailed problem-solving approach, coupled with a strong sense of ownership and drive
  • A passionate bias to action and passion for delivering high-quality data solutions
  • Understanding of CI/CD principles, familiar with version control systems (Git)
  • Attention to detail and quality with excellent problem solving and interpersonal skill


Education:


  • BS/MS in Computer Science, Information Management or related field

Please let the company know that you found this position on this Job Board as a way to support us, so we can keep posting cool jobs.