Job Description
We are seeking a motivated and detail-oriented Data Engineer to join our team as a full-time employee. As a Data Engineer, you will play a pivotal role in designing, building, and maintaining distributed data pipelines and data processing systems. The ideal candidate should be proficient in Python and have a strong background in handling large-scale datasets. You will work closely with ML Engineers, full-stack engineers, and internal and external customers to ensure the success of our patent search platform. This role reports directly to the CTO and encompasses a mix of responsibilities in big data processing and analysis.
Responsibilities:
- Create and optimize big data processing pipelines for text information extraction, cleaning, and annotation.
- Merge information from various data sources to produce enhanced datasets for patent search and analysis.
- Identify and address data quality issues by collaborating with data source owners and implementing preventive or corrective measures.
- Design and maintain catalog databases for data status and traceability to ensure data integrity.
- Keep data indices up to date with all data sources and ensure data accessibility for our users.
- Assist the Machine Learning team in deploying ML models in massively distributed cloud environments.
- Support our development teams in big data tasks, providing code and infrastructure on AWS.
Skills and Qualifications:
- Proven hands-on experience in building and optimizing distributed data pipelines using Python and Python’s ray.io library.
- Strong understanding of Big Data processing frameworks.
- Strong skills in development of asynchronous, distributed, and multiprocessing/multithreading algorithms.
- Familiarity with cloud-based data solutions in AWS.
- Solid knowledge of data parsing techniques and data serialization formats.
- Proficiency in NLP techniques and frameworks for document annotation and extraction.
- Strong problem-solving skills and ability to work independently and as part of a team.
- Excellent communication and collaboration skills to effectively work with various stakeholders.
- Familiarity with development technologies in Linux and CI/CD practices.
What We Offer:
- Join an amazing team in a relaxed and fun environment.
- 100% remote work with a flexible schedule.
- Be part of an international team with members in the US and Barcelona.
- Continuous education and knowledge exchange, including attending conferences.
Location and Working Hours:
- This is a remote position based in either the US or in Barcelona.
- Barcelona-based employees have access to a co-working space for collaborative working.
- You must be able to work US East Coast or Central European Time Zones.
- Background Check required.
Company Description
We are Ensemble IP, a forward-thinking team of AI/Machine Learning experts and Intellectual Property (IP) Industry veterans building a new approach to using patent information to foster innovation. Quartet is our newly introduced, industry-leading, AI-enabled patent search platform allowing users to easily access and research over 145 million patent documents from more than 75+ patenting authorities throughout the world . Our team of highly-skilled patent analysts works directly with patent practitioners at top law firms and corporations worldwide to support legal and business decisions.
We have a fast-paced collaborative culture focused and every single one of us is an integral part of the mission and we believe in working together to meet our goals. We enjoy our work and are passionate about our mission; we have fun each day and take satisfaction in a job well done. We are a virtual organization and use technology platforms to communicate and effectively work together across locations and time zones.
We are Ensemble IP, a forward-thinking team of AI/Machine Learning experts and Intellectual Property (IP) Industry veterans building a new approach to using patent information to foster innovation. Quartet is our newly introduced, industry-leading, AI-enabled patent search platform allowing users to easily access and research over 145 million patent documents from more than 75+ patenting authorities throughout the world . Our team of highly-skilled patent analysts works directly with patent practitioners at top law firms and corporations worldwide to support legal and business decisions. We have a fast-paced collaborative culture focused and every single one of us is an integral part of the mission and we believe in working together to meet our goals. We enjoy our work and are passionate about our mission; we have fun each day and take satisfaction in a job well done. We are a virtual organization and use technology platforms to communicate and effectively work together across locations and time zones.