There are millions of long legal documents available from various sources. Many are available as structured datasets, others need to be crawled.
For good autocompletion we need to be clever about what data goes into evaluation and training. We need to develop tools that manage and select data subsets so that we can evaluate performance separately for the various autocompletion scenarios present in the data.Tasks
- Develop tools and pipelines around the generation/filtering of huge datasets
- Build large scale web crawlers, text clustering methods
- Critical: Very strong coding background
- Working experience in building text or ML pipelines at a large scale
- Part of a small world-class team
- Perks like: a badass office in Berlin Mitte, free fruit, great coffee, company offsites
: add the phrase autocompletion is all you need at the beginning of your cover letter/message. Apart from that a cover letter is not required. We have hundreds of applicants per week. This helps with filtering.