AWS Data Engineer - Remote
ClearScale is a leading cloud system integration company and AWS Premier Consulting Partner providing a wide range of cloud services including cloud consulting, architecture design, migration, automation, application development, and managed services.
We help Fortune 500 enterprises, mid-sized businesses, and startups in verticals like Healthcare, Education, Financial Services, Security, Media, and Technology succeed with ambitious, challenging, and unique cloud projects. We architect, develop, and launch innovative and sophisticated solutions using the best cutting-edge cloud technologies.
ClearScale is growing quickly and there is high demand for the services we provide. Clients come to us for our deep experience with Big Data, Containerization, Serverless Infrastructure, Microservices, IoT, Machine Learning, DevOps, and more.
ClearScale is looking for an experienced Data Engineer to participate in a custom data pipeline development project.
- Migrate data located in a multitude of data stores, into the Data Lake.
- Orchestrate processes to ETL that data, slice it into the various data marts.
- Manage access to the data through Lake Formation
- Build a data delivery pipeline to ingest a high volume of the real-time streams, detect anomalies, slice into the window analytics, put those results in the Elastic search system for the further dashboard consumption
- Analyze, scope, and estimate tasks, identify technology stack and tools
- Design and implement optimal architecture and migration plan
- Develop new and re-architecture solution modules, re-design and re-factor program code
- Specify the infrastructure and assist DevOps engineers with provisioning
- Examine performance and advise necessary infrastructure changes
- Communicate with the client on project-related issues
- Collaborate with in-house and external development and analytical team
- Hands-on experience designing efficient architectures for high-load enterprise-scale applications or ‘big data’ pipelines
- Hands-on experience utilizing AWS data toolsets including but not limited to DMS, Glue, Data Brew, EMR, SCT
- Practical experience in implementing big data architecture and pipelines
- Hands-on experience with message queuing, stream processing, and highly scalable ‘big data’ stores
- Advanced knowledge and experience working with SQL and NoSQL databases
- Proven experience in re-design and re-architecting the large complex business applications
- Strong self-management and self-organizational skills
- Successful candidates should have experience with any of the following software/tools (not all required at the same time):
- Python and PySpark - strong knowledge especially with developing Glue jobs
- Big data tools: Kafka, Spark, Hadoop (HDFS3, YARN2, Tez, Hive, HBase)
- Stream-processing systems: Kinesis Streaming, Spark-Streaming, Kafka Streams, Kinesis Analytics
- AWS cloud services: EMR, RDS, MSK, Redshift, DocumentDB, Lambda
- Message queue systems: ActiveMQ, RabbitMQ, AWS SQS
- Federated identity services (SSO): Okta, AWS Cognito
- We are looking for a candidate with 5+ years of experience in Data, Cloud, or Software Engineer role, who has attained a degree in Computer Science, Statistics, Informatics, Information Systems, or another quantitative field
- Usage of HUDI with AWS Data Lakes
- Graph databases development and optimization 3+ years
- Neo4j, SPARQL, GREMLIN, TinkerPop, Pregel, Cypher, Graph Databases, Amazon Neptune, Knowledge Graphs
- Valid AWS certificates would be a great plus