At Storytel we believe that powerful stories add an extra dimension to life. We offer hundreds of thousands of audiobooks and ebooks to customers in more than 20 markets, with several new markets launching in the coming year. Storytel is Northern Europe's largest audiobook streaming service and we’re now looking for a tester to join the team! Storytel’s vision is to make the world a more empathic and creative place with great stories to be shared and enjoyed anytime, anywhere by anyone.
About the Team
The role is in the Speech team, a part of the larger Intelligence group which houses our machine learning and data science teams. In the Speech team we build services that enable Storytel to efficiently generate new, and understand existing content. In particular, our team owns the entire Text-to-Speech stack at Storytel, from data curation to modelling decisions, training and deployment infrastructure. In order to accelerate the development, and get the system in production, we are growing our team. Since the team is new, each position we're hiring for is considered essential. Our new team members will be expected to take on large responsibilities and will impact all aspects of our work. While each role's main responsibilities are different, we will all work closely together to achieve our big ambitions of highly automated and prosodically rich speech synthesis.
We are an international company with colleagues in the larger Intelligence team in Stockholm, Barcelona and Copenhagen. The Speech team is currently based in Stockholm, and while we hope to keep building the team in the Stockholm offices we are open to work with the right candidates to find a solution that is great for both parties.
About the Role
As a Machine Learning Engineer focusing on Deep Learning you will have an essential role in developing TTS and other speech technology applications. While we work together in the team on many parts of our services, you will have a large responsibility for model implementation, our training infrastructure and the way we package our artifacts for serving. We also expect you to help improve how we use and build datasets for TTS, e.g. setting up self-/semi-supervised learning for various models in our stack or working with active learning to improve the annotated part of our data. To do this well we believe that you will also need to stay up to date with developments in deep learning and interact with the research and open source communities.
- Work together with the team to implement models, training and deployment of speech services, in particular TTS.
- Take a large role in defining our training architecture and implement training loops and models.
- Increase our iteration speed by improving training efficiency through distributed training and other optimizations.
- Improve how we use our data by e.g. setting up self-/semi-supervised learning for more parts of our stack, introducing active learning and making our annotation process more automated.
- Improve the text handling stages of our TTS stack by designing context-based models for problems like text normalization and g2p.
- Ensure that we build and version our artifacts. Package them so that they may be used in our deployments.
You understand that in practice, deep learning requires more engineering work than advertised, and that a significant amount of the work is required in everything from data preparation, versioning and deployment. Your strong understanding of neural networks helps you in debugging networks and improving them and understanding various test time tradeoffs. In this rapidly evolving field you enjoy keeping up to date with new methods and might even have a few favorite applications that you follow closely
To be successful in this role we believe that you have
MSc degree, or similar, in Speech technology, Machine Learning, Computer Science, Mathematics, Physics, or a related field
️Research or industry experience with Deep Learning
️Experience getting deep neural networks in productionGood knowledge of recent developments in at least one of:
Self-/Semi-Supervised Learning, Active Learning, Distributed Training, Generative Models, Flow based models, GANs, Autoregressive models, Transformers, Audio, Speech️In-depth understanding of one of the frameworks:
Tensorflow, PyTorch or JAX
Strong Python development knowledge
Excellent at writing and speaking EnglishWhile not required, we would also love to hear about any of:
- Experience working with deep learning in any of the following applications: natural language processing, speech processing, audio recognition, audio generation, content generation.
- Experience with large scale training with multi-GPU / TPU setups and setting up distributed training (towered or multi-node).
- Familiarity with tools for serving deep learning artifacts, such as TensorFlow serving, TorchServe, TensorRT (Triton) etc.
- Contributions to open source frameworks or tools for Deep Learning.
- Experience using tools like Google Cloud, Docker, Kubernetes or Kubeflow.
- Published in top tier ML conferences like ICASSP, Interspeech, ACL, NeurIPS, ICLR, ICML, AAAI
What we offer
- Participate in developing a top-notch streaming entertainment platform used by over a million users worldwide
- Plenty of autonomy and responsibility
- Your own yearly education budget
- A workplace that values creativity and personal initiative
- Limitless audio and ebooks from our own service
- An international team of super-talented colleagues
- Explore, work and implement some of the newest and hottest technologies
- A company full of book lovers
- Ability to work from any of our offices in Umeå, Stockholm, Karlstad, Lund and Copenhagen
Does this sound like you? If you feel like Storytel is a place where you could thrive, let us know and we will contact you as soon as possible.