Publix is able to offer virtual employment for this position in the following states: FL, GA, AL, TN, SC, NC, VA
Participate in Enterprise Data Warehouse and business application team projects with a focus on gathering requirements, design, development, and implementation of large-scale Big Data Cloud solutions Collaborate with the EDW team, Data Architects, DBAs, and application team developers to deliver large-scale Big Data solutions Participate in team reviews of ETL design artifacts and code, recommend changes as needed, and recommend alternative solutions where appropriate Coordinate with Release Management to plan and prepare testing environments and data seeding on a project-by-project basis Follow established processes for configuration and release management to ensure that all project artifacts are managed, integrated, and versioned according to standards Provide project status and translate technical concepts/issues to senior management
Required Qualifications:
Bachelor’s degree in Computer Science or other analytical discipline or equivalent experience
Minimum of seven years’ experience designing, developing, and supporting applications in an enterprise environment
Minimum five years’ of experience as a Data Engineer
Minimum five years’ of experience using AWS, GCP, or Azure cloud computing technologies
Minimum two years’ of hands-on experience with Databricks, Apache Spark, Python, SQL or relational databases, Hadoop, and data pipeline management in Spark for batch and streaming
Experience tuning Databricks deployments
Experience working in a data lake environment handling structured and unstructured data, leveraging data streaming, and developing data pipelines driven by events/queues.
In-depth knowledge of Model and Design of DB schemas for read and write performance.
Experience designing and developing data management and data persistence solutions for applications leveraging databases (Relational and NoSQL).
Experience building processes supporting data transformation, data structures, metadata, dependency, and workload management.
Experience performing root cause analysis on all data and processes to answer specific questions and identify opportunities for improvement.
Experience working in a fast-paced, innovative environment
Ability to conform to shifting priorities, demands, and timelines
Strong self-initiative with the ability to identify areas of improvement with little direction
Attention to detail with the ability to produce reliable, effective solutions
Excellent communication skills
Possessing a positive attitude and ability to work in a collaborative and energetic team environment
Experience in designing large scale Distributed applications that are batch, micro-batch, and real-time
Familiar with sourcing on-prem data into the cloud and persisting the data for scale
Familiar with data partitioning, compaction, and open-source file formats like Parquet, ORC, AVRO and Delta