Cloudera Big Data Engineer – IoT Cybersecurity – ITmPowered Sr. Cloudera Big Data Engineer, you will create a Cloudera IoT Cyber Data Lake and Data Pipelines to make Medical Device and IoT Cybersecurity data meaningful and actionable. Deliver Data Engineering solutions; large scale data structures, data lake, data pipelines and workflows to gather, cleanse, test […]
Cloudera Big Data Engineer – IoT Cybersecurity – ITmPowered
Sr. Cloudera Big Data Engineer, you will create a Cloudera IoT Cyber Data Lake and Data Pipelines to make Medical Device and IoT Cybersecurity data meaningful and actionable. Deliver Data Engineering solutions; large scale data structures, data lake, data pipelines and workflows to gather, cleanse, test and curate datasets. Wrangle Data: Discover to Structure, Clean, Enrich, Validate, and Publish data to operational data models for data science, analytics, dashboards and data drill down utilization. Ingest data from multiple data sources (Splunk, Qualis, CMDB/Asset Inventory, CyberArk, Armis, ForeScout, Patch Management, Governance and Standards data, Risk Management, Asset Management) to create quality accurate operational datasets in Enterprise Big Data Lake (100+ node Spark Hadoop). Develop Big Data pipelines using Kafka, SQOOP, Flume, Java, Scala, Hive, Pig, Hadoop (Cloudera) to operationalize (build in Data Quality, data validation) ingest, transform, aggregate, enrich, validate, and Dashboard data model. Work with Data Scientists, Technology, Cyber, and Executive leadership to Measurably reduce cyber security risk across IoT/Medical Devices. Deliver runbooks, data models, maps, and dictionaries. Mentor client teams in Data Engineering.
- Deliver Data Engineering solutions, contributes to analytics/solutions design and operationalization.
- Ingest, aggregate, transform, cleanse, and publish quality actionable data.
- Develop ingestions of raw data from 10-20+ data source feeds, ingestions using Flume, Sqoop, Kafka, etc.
- Develop data pipelines to process data using Map Reduce, Spark, Oozie, Hive, HiveQL, Pig, Java, Spark, SQL.
- Collaborates with big data to transform data and integrate algorithms and models into automated processes.
- Leverages Hadoop or Spark architectures, in designing & optimizing queries to build data pipelines.
- Builds data models, data marts and ODS to support Cyber Security, Technology Risk, Data Science business intelligence, data science, and other internal customers.
- Be part of a fast paced, high-impact team who will work with an entrepreneurial mindset using best of breed tools in our Big Data environment; Cloudera, Java, SQL, R, Spark, Python, Trifacta, Arcadia.
- Develops large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
- Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
- Uses strong programming skills in Java, Python, and SQL to build robust data pipelines and dynamic systems.
- Leverage Cloudera, Parquet, partitioning, and bucketing. Develop mappers, reducers, map/reduce side joins.
- Build Data Quality into the Ingestion, ETL, aggregation, parsing, and joining processes. Build in auto data checks back to raw data sets to validate and report data quality, anomalies, nulls, etc.
- Graduate-level degree in computer science, engineering, or relevant experience in the field of Business Intelligence, Data Mining, Data Engineering, Database Engineering, Big Data (MapReduce) Programming
- 1+ year work experience on Cloudera platform. Cloudera Certified(CCP) Data Engineer – preferred.
- Data Pipeline Engineering on Cloudera (Kafka, Spark, Scala, Impala, Hive, Sqoop, Flum, Python, R, Java)
- 5+ years of working on Hadoop ecosystem with Hadoop distributions (Cloudera 6.x, Hortonworks ..)
- Hands on Spark / Hadoop experience (Hive, Impala, Sqoop, UDF, Oozie, Map reduce, Spark Framework, HDFS).
- Data Wrangling experience leveraging Trifacta (or similar) for data wrangling including; data structuring, ingestion, cleansing, enrichment, and validation logic on Hadoop HDFS, map reduce or Spark.
- Knowledge in Hadoop architecture, HDFS commands and designing & optimizing queries against data in HDFS.
- Expertise in data development, management software; Trifacta, Arcadia, Alteryx, Tamr, Paxdata, spotfire.
- Coding experience with Java, SQL and any of the following languages: Spark, Scala, Python, HQL, PigQL,
- Experience with data formats including Parquet, ORC or AVRO and moving data into and out of HDFS.
- Experience in Streaming technologies (Kafka, Spark streaming, etc..)
- Strong skills in programming languages (Python, Shell Scripting, Java).
- Experience working with data visualization tools like; Arcadia, Tableau, Protovis, Vega, D3.
- Experience building Exploratory Data Analysis reports such as Histograms, Box plots, Pareto, Scatter Plot using Data Visualization tool such as Tableau, Spotfire
- Experience building data transformation and processing solutions.
- Possesses strong personal skills to portray information, communicate with non technical users / executives.
- Working knowledge of SQL and Relational Databases and NoSQL datastores (Cassandra, MongoDB, Neo4J, …)
- Experience in data structures, algorithms. various RDBMS, Data types and Primary/Foreign Key constrains.
- Cloudera Certified(CCP) Data Engineer – preferred.
About ITmPowered Consulting
ITmPowered’s Application Information and Data Management (AIDM) Practice creates solutions that cover the entire lifecycle of information utilization, from ideation through implementation. We engage with clients in building and maturing Big Data programs from the ground up; methodology design, solution engineering, and data enablement, data quality, analytics, and BI Dashboarding of Big Data ecosystems. We offer consulting and development services to help our clients define their strategy and solution architecture. Then our teams deliver and manage high performance big data, data warehousing, business analytics and dashboarding applications that provide tangible business benefits. Outstanding solutions delivered.
PAY, PERKS, LOGISTICS: Denver, Colorado – 1 year Contract to Perm. W2 only. Pay and perks as follows.
ITMPOWERED COMP BENEFITS SUMMARY
- Estimated Pay Range: $60-70/hr – actual pay depends upon experience level – talk to your recruiter.
- Med/Dent/Vis – Pre-tax group rates. We contribute $300/mo to premiums. Emp rest.
- Med – United Health Care – 4 plans – HMO, 2PPO’s, 1 HSA/PPO
- Dent – Humana – Good Orthodontia plan
- Vis – VSP – has the largest network.
- 401k – Match 4% (100% of first 2%, then 50% of next 6%). Vesting 2yrs w/ 1000hrs/yr
CLIENT PERM COMP BENEFITS SUMMARY UPON CONVERSION TO PERM
- Expected Base Salary Range: $120-140k/yr + 10% Bonus annually
- PTO: Generous PTO Program
- Outstanding Med/Dent/Vision package for employee and family
- 403b w/ 5% match retirement plan
- Pension – YES! One of the few companies who still offer one.
- Training / Education reimbursement – Amazing training platforms.
- Local Denver resources only. No relocation provided.
- Will be remote primarily but must be able to come into DTC office periodically after COVID Abates.
- COVID-19 – Must be fully vaccinated OR provide medical or religious exemption
- W2 only – No sub vendors. Sponsorship NOT available.
- Must be able to pass a 10 year background check and 12 panel drug screen.
ITmPowered’s Commitment to Diversity, Equity, and Inclusion:
At ITmPowered Consulting, we are committed to fostering a diverse, equitable, and inclusive working environment where we value and develop employees of all backgrounds and experiences. We firmly believe collaboration among team members with varied pasts and perspectives generates more incisive and deeper insights that better serve our clients, employees, and community. We aim to fulfill the following goals:
- Recruit, develop and retain talented employees with diverse backgrounds and experiences
- Enhance employee engagement
- Expand community engagement and impact