Big Data Engineer – ITmPowered Consulting Big Data Engineer will be tasked with creating a data ecosystem to have the right data, to ask the right question, at the right time to make data meaningful and actionable. Deliver Data Engineering solutions including large scale data structures, data lake, data pipelines and workflows to gather, cleanse, […]
Big Data Engineer – ITmPowered Consulting
Big Data Engineer will be tasked with creating a data ecosystem to have the right data, to ask the right question, at the right time to make data meaningful and actionable. Deliver Data Engineering solutions including large scale data structures, data lake, data pipelines and workflows to gather, cleanse, test and curate datasets. Wrangle Data Discover to Structure, Clean, Enrich, Validate, and Publish data to an operational data model / ODS for BI Dashboard and data drill down/up utilization enterprise wide. Ingest data from multiple data sources (Splunk, Qualis, CMDB/Asset Inventory, CyberArk, Armis, ForeScout, Automated Patch Management systems, Governance and Standards data, Risk Management data, Cyber, ITSM, etc.) to create quality accurate operational datasets in Enterprise Big Data (100 node Spark Hadoop) for use by several teams. Develop high performance Big Data pipelines using Kafka, SQOOP, Flume, Java, Scala, Hive, Pig, Hadoop (Cloudera) to operationalize (build in Data Quality, data validation, governance and maintenance automation) the ingestion, ETL, transforms, aggregations, joins, enrichment, validations and final ODS BI Dashboard underpinnings. Work with Data Scientists, Technology, Cyber, and Executive leadership to advance their causes to enable better decision making across the org. Deliver runbooks, data models, source to target mappings, and updated data dictionaries. Mentor and train the client teams in optimized Data Solution Engineering.
Responsibilities
- Deliver Data Engineering solutions, contributes to analytics/solutions design and operationalization.
- Ingests, merges, prepares, tests, documents curated datasets from various novel external and internal datasets for a variety of BI Dashboard and Analytics projects for IT Risk, Cyber Security, PKI, and Compliance.
- Data Wrangling / Data Matching / ETL techniques to explore a variety of data sources, gain data expertise, perform summary analyses and curate datasets.
- Ingest, aggregate, transform, cleanse, and publish quality actionable data.
- Be part of a fast paced, high-impact team who will work with an entrepreneurial mindset using best of breed tools in our Big Data environment; Hadoop/Cloudera, Java, SQL, R, Spark, Python, Trifacta, Arcadia.
- Develops large scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
- Develop ingestions of raw data from 10-20+ data source feeds, ingestions using Flume, Sqoop, Kafka, etc.
- Develop data pipelines to process data using Map Reduce, Spark, Oozie, Hive, HiveQL, Pig, Java, Spark, SQL.
- Collaborates with big data to transform data and integrate algorithms and models into automated processes.
- Leverages Hadoop or Spark architectures, in designing & optimizing queries to build data pipelines.
- Builds data models, data marts and ODS to support Cyber Security, Technology Risk, Data Science business intelligence, data science, and other internal customers.
- Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
- Writes ETL (Extract / Transform / Load) processes, data systems, and tools for analytic processing.
- Uses strong programming skills in Java, Python, and SQL to build robust data pipelines and dynamic systems.
- Leverage Cloudera, Parquet, partitioning, and bucketing. Develop mappers, reducers, map/reduce side joins.
- Build Data Quality into the Ingestion, ETL, aggregation, parsing, and joining processes. Build in auto data checks back to raw data sets to validate and report data quality, anomalies, nulls, etc.
Required skills:
- Graduate-level degree in computer science, engineering, or relevant experience in the field of Business Intelligence, Data Mining, Data Engineering, Database Engineering, Big Data (MapReduce) Programming
- 5+ years of working on Hadoop eco system with Hadoop distributions (Cloudera 6.x, Hortonworks ..)
- Hands on Spark / Hadoop experience (HDFS, Hive, Impala, Sqoop, UDF, Oozie, Map reduce, Spark Framework).
- Knowledge in Hadoop architecture, HDFS commands and designing & optimizing queries against data in HDFS.
- 3-5 years’ Data Wrangling experience leveraging Trifacta (or similar) for data wrangling including; data structuring, ingestion, cleansing, enrichment, and validation logic on Hadoop HDFS, map reduce or Spark.
- Expertise in data development, management software; Trifacta, Arcadia, Alteryx, Tamr, Paxdata, spotfire.
- Coding experience with Java, SQL and any of the following languages: Spark, Scala, Python, HQL, PigQL,
- Experience with data formats including Parquet, ORC or AVRO and moving data into and out of HDFS.
- Experience working with data visualization tools like; Arcadia, Tableau, Protovis, Vega, D3.
- Experience building Exploratory Data Analysis reports such as Histograms, Box plots, Pareto, Scatter Plot using Data Visualization tool such as Tableau, Spotfire
- Experience building data transformation and processing solutions.
- Ability to manipulate voluminous data with different degree of structuring across disparate sources to build and communicate actionable insights for internal or external parties
- Possesses strong personal skills to portray information, communicate with non technical users / executives.
- Working knowledge of SQL and Relational Databases and NoSQL datastores (Cassandra, MongoDB, Neo4J, …)
- Experience in Streaming technologies ( Kafka, Spark streaming, etc..)
- Strong skills in programming languages (Python, Shell Scripting, Java).
- Experience in data structures, algorithms. various RDBMS, Data types and Primary/Foreign Key constrains.
- Hadoop or Cloudera Developer Certification (CDH 6.X ) preferred.
About ITmPowered Consulting
ITmPowered’s Application Information and Data Management (AIDM) Practice creates solutions that cover the entire lifecycle of information utilization, from ideation through implementation. We engage with clients in building and maturing Big Data programs from the ground up; methodology design, solution engineering, and data enablement, data quality, analytics, and BI Dashboarding of Big Data ecosystems. We offer consulting and development services to help our clients define their strategy and solution architecture. Then our teams deliver and manage high performance big data, data warehousing, business analytics and dashboarding applications that provide tangible business benefits. Outstanding solutions delivered.
The Perks:
- Comprehensive medical, dental and vision plans available for you and your dependents
- 401(k) Retirement Plan with Employer Match
- Competitive Compensation
- Collaborative and cool culture
- Work-life balance
Logistics:
- Local Denver resources only. No relocation provided.
- Will be remote primarily but must be able to come into DTC office periodically after COVID Abates.
- No sub vendors. Sponsorship NOT available.