Our client is building an intelligent code discovery platform that provides the best tools for developers to discover code in any form—and benefit through insights, recognition, and greater productivity. They are transforming code search to improve the practice of modern programming—using a graph-based approach drawing on data from the entire open source ecosystem. The mission is to build the world’s best code discovery engine. The company is well-funded by top investors in Silicon Valley, including the first investors of Google, Twitter, Zoom, LinkedIn, and Uber. The team has backgrounds from NASA, LinkedIn, Facebook, Amazon, AWS, Cisco and MIT, Harvard, Stanford, and Berkeley. The headquarters are in San Francisco, California. They are building a globally distributed, remote-first team.
For our client, we are seeking a Data Engineer to lead technology development on the frontier of code discovery and developer productivity. A successful applicant is an expert in data engineering and complex data analysis spanning natural language, code syntax and networks. You will help the team to design, test and rapidly iterate on multiple products and services stemming from our core technology. You will develop prototypes, tools, and methods that inform decision-making for software developers (e.g. “Is this the right solution to my coding problem?” or “How do I implement this specific code in my application?” or “What code libraries are other developers using to solve my problem?”).
The ideal candidate is a software engineer with a particular focus on data movement and orchestration. You are an explorer, looking to help developers discover code in any form and improve their productivity. You analyze complex datasets to find important signals that help developers write better code. You love collecting data from many different sources. You are interested in using machine learning to empower better software development.
Implement data engineering best practices
Design and develop data systems for machine learning
Write real-time pipelines that execute complex operations on incoming data
Experiment in ways that accelerate prototyping and maximize resource utilization
Extend existing libraries and frameworks and create new ones
Manage our data pipeline, including scheduling, dataflow programming, SQL and data labeling
Orchestrate the operation of clusters of commodity machines
Review code, mentor other engineers and support the data team
Attract, recruit and retain top engineering and scientific talent
Knowledge of microservices and cloud computing—expert in at least one cloud platform
Proficient with distributed systems and the orchestration of large numbers of independent commodity machines into complete, functional systems to handle diverse workloads
Minimum 5+ years of professional software engineering experience
Expertise with ETL
Proven experience as a senior data engineer or similar role
Bachelor’s or Master’s degree in computer science/engineering, mathematics, physics, or other related technical fields with equivalent practical experience
Experience with Natural Language Processing and Understanding (NLP & NLU)
Knowledge of math, probability, statistics and algorithms
Familiarity with machine learning frameworks (like Keras or PyTorch)
Advanced working knowledge of information retrieval and search technologies and have set up and used open-source search systems to query and understand data
Experience with most of the following technologies:
ElasticSearch, Solr or equivalent experience
Machine learning infrastructure
Any task scheduler
Any graph database
You have the opportunity to join an early-stage startup and have significant ownership of critical technical components. You will work at the highest level and collaborate with world-class advisors and technical experts. The team is growing rapidly and we hope you’ll grow with them, too.
Competitive salary & equity packages
Flexible medical, dental, and vision benefits for you and your family
Unlimited vacation and sick leave
Strong remote work culture that includes group activities and local gatherings