Yash Technologies Singapore is looking for a Data Engineer who is familiar with the Hadoop platform and is able
to design, implement and maintain optimal data/machine learning (ML) pipelines in the platform.
The following are the main responsibilities of the role:
- Designing and implementing fine-tuned production ready data/ML pipelines in Hadoop platform;
- Driving optimization, testing and tooling to improve quality;
- Reviewing and approving high level & detailed design to ensure that the solution delivers to the business needs and align to the data & analytics architecture principles and roadmap;
- Understanding business requirement and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements;
- Following proper SDLC (Code review, sprint process);
- Identifying, designing, and implementing internal process improvements: automating manual processes, optimizing data delivery, etc;
- Building robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users;
- Understanding various data security standards and using secure data security tools to apply and adhere to the required data controls for user access in Hadoop platform;
- Supporting and contributing to development guidelines and standards for data ingestion;
- Working with data scientist and business analytics team to assist in data ingestion and data related technical issues;
- Designing and documenting the development & deployment flow.
- Experience in developing rest API services using one of the Scala frameworks.
- Ability to troubleshoot and optimize complex queries on the Spark platform.
- Expert in building and optimizing ‘big data’ data/ML pipelines, architectures and data sets.
- Knowledge in modelling unstructured to structured data design.
- Experience in Big Data access and storage techniques.
- Experience in doing cost estimation based on the design and development.
- Excellent debugging skills for the technical stack mentioned above which even includes analyzing server logs and application logs.
- Highly organized, self-motivated, proactive, and ability to propose best design solutions.
- Good time management and multitasking skills to work to deadlines by working independently and as a part of a team.
- Ability to analyse and understand complex problems.
- Ability to explain technical information in business terms.
- Ability to communicate clearly and effectively, both verbally and in writing.
- Strong in user requirements gathering, maintenance and support
- Excellent understanding of Agile Methodology.
- Good experience in Data Architecture, Data Modelling, Data Security.
Experience - Must have:
a) Scala: Minimum 2 years of experience
b) Spark: Minimum 2 years of experience
c) Hadoop: Minimum 2 years of experience (Security, Spark on yarn, Architectural
d) Hbase: Minimum 2 years of experience
e) Hive - Minimum 2 years of experience
f) RDBMS (MySql / Postgres / Maria) - Minimum 2 years of experience
g) CI/CD Minimum 1 year of experience
Experience (Good to have):
b) Spark Streaming
c) Apache Phoenix
d) Caching layer (Memcache / Redis)
e) Spark ML
f) FP (Scala cats / scalaz)
Bachelor's degree in IT, Computer Science, Software Engineering, Business Analytics or equivalent with at-least 2 years of experience in big data systems such as Hadoop as well as cloud-based solutions.
Interested applicants can apply here or email to email@example.com