Designing and implementing large-scale, distributed data processing systems using technologies such as Apache Hadoop, Apache Spark, or Apache Flink.
Developing and optimizing data pipelines and workflows for ingesting, storing, processing, and analyzing large volumes of structured and unstructured data.
Collaborating with data scientists, data analysts, and other stakeholders to understand data requirements and translate them into technical solutions.
Building and maintaining data infrastructure, including data lakes, data warehouses, and real-time streaming platforms.
Designing and implementing data models and schemas for efficient data storage and retrieval.
Ensuring the scalability, availability, and fault-tolerance of big data systems through proper configuration, monitoring, and performance tuning.
Identifying and evaluating new technologies, tools, and frameworks to improve the efficiency and effectiveness of big data processing.
Implementing data security and privacy measures to protect sensitive information throughout the data lifecycle.
Collaborating with cross-functional teams to integrate data from various sources, including structured databases, unstructured files, APIs, and streaming data.
Developing and maintaining documentation, including data flow diagrams, system architecture, and technical specifications.
Bachelor's or higher degree in Computer Science, Engineering, or a related field.
Proven experience as a big data engineer or a similar role, with a deep understanding of big data technologies, frameworks, and best practices.
Strong programming skills in languages such as Java, Scala, or Python for developing big data solutions.
Experience with big data processing frameworks like Apache Hadoop, Apache Spark, Apache Flink, or similar.
Proficiency in SQL and NoSQL databases, as well as data modeling and database design principles.
Familiarity with cloud platforms and services, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).
Knowledge of distributed computing principles and technologies, such as HDFS, YARN, and containerization (e.g., Docker, Kubernetes).
Understanding of real-time streaming technologies and frameworks, such as Apache Kafka or Apache Pulsar.
Strong problem-solving skills and ability to optimize and tune big data processing systems for performance and scalability.
Excellent communication and teamwork skills to collaborate with cross-functional teams and stakeholders.