Job description

PySpark Data Engineer Our client in Charlotte, NC is looking for hardworking, motivated talent to join their team. Our Fortune 50 Enterprise client is hring for a hybrid PySpark Data Engineer to start right away! Don't wait apply today! What's in it for you? $85-89hrHybridFull TimeHealth, Dental and Vision Insurance, 401k and other great benefits!Job Description [8:41 AM] Jasmine NicholsonResponsibilities:The Big Data Lead software engineer is responsible for owning and driving the technical innovation along with big data technologies. The individual is a subject matter expert technologist with strong Python experience and deep hands-on experience building data pipelines for the Hadoop platform. This person will be part of successful Big Data implementations for large data integration initiatives. The candidates for this role must be willing to push the limits of traditional development paradigms typically found in a data-centric organization while embracing the opportunity to gain subject matter expertise in the cyber security domain. Lead the design and development of sophisticated, resilient and secure engineering solutions for modernizing our data ecosystem that typically involve multiple disciplines, including big data architecture, data pipelines, data management, and data modeling specific to consumer use cases.Provide technical expertise for the design, implementation, maintenance, and control of data management services - especially end-to-end, scale-out data pipelines.Develop self-service, multitenant capabilities on the cyber security data lake including custom/of the shelf services integrated with the Hadoop platform, use API and messaging to communicate across services, integrate with distributed data processing frameworks and data access engines built on the cluster, integrate with enterprise services for data governance and automated data controls, and implement policies to enforce fine-grained data accessBuild, certify and deploy highly automated services and features for data management (registering, classifying, collecting, loading, formatting, cleansing, structuring, transforming, reformatting, distributing, and archiving/purging) through Data Ingestion, Processing, and Consumption stages of the analytical data lifecycle.Provide the highest technical leadership in terms of design, engineering, deployment and maintenance of solutions through collaborative efforts with the team and third-party vendors.Design, code, test, debug, and document programs using Agile development practices.Review and analyze complex data management technologies that require in depth evaluation of multiple factors including intangibles or unprecedented factors.Assist in production deployments, including troubleshooting and problem resolution.Collaborate with enterprise, data platform, data delivery, and other product teams to provide strategic solutions, influencing long range internal and enterprise level data architecture and change management strategies.Provide technical leadership and recommendation into the future direction of data management technology and custom engineering designs.Collaborate and consult with peers, colleagues, and managers to resolve issues and achieve goals. [8:41 AM] Jasmine Nicholson Required Qualifications: 7+ years of Big Data Platform (data lake) and data warehouse engineering experience demonstrated through prior work experiences. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors.3+ years of hands-on experience building modern, resilient, and secure data pipelines, including movement, collection, integration, transformation of structured/unstructured data with built-in automated data controls, and built-in logging/monitoring/alerting, and pipeline orchestration managed to operational SLAs. Preferably using Airflow, DAGS, connector plugins.5+ years of strong Python and other functional programming skills Desired Qualifications: Hands-on experience developing and managing technical and business metadataExperience creating/managing Time-Series data from full data snapshots or incremental data changesHands-on experience with implementing fine-grained access controls such as Attribute Based Access Controls using Apache RangerExperience automating DQ validation in the data pipelinesExperience implementing automated data change management including code and schema, versioning, QA, CI/CD, rollback processingExperience with automating end to end data lifecycle on the big data ecosystemExperience with managing automated schema evolution within data pipelinesExperience implementing masking and/or other forms of obfuscating dataExperience designing and building microservices, APIs and, MySQLAdvanced understanding of SQL and NoSQL DB schemasAdvanced understanding of Partitioned Parquet, ORC, Avro, various compression formatsDeveloping containerized Microservices and APIsGoogle cloud data services experience (bonus)Familiarity with key concepts implemented by Apache Hudi or Iceberg, or Databricks Delta Lake (bonus) What are the top 3 must have skills? Pyspark Ariflow Hadoop Python Why should you choose Experis? Medical, Dental, Vision, 401kWeekly pay with direct depositConsultant Care supportFree Training to upgrade your skillsDedicated Career Partner to help you achieve your career goalsAre you Interested? Share this job with friends and family and earn dollars with every successful hire.

Please let the company know that you found this position on this Job Board as a way to support us, so we can keep posting cool jobs.