Data Engineer

Location: New York, NY

*** Mention DataYoshi when applying ***

Company Overview

Graphika’s mission is to map the world’s social media networks, so that online spaces can thrive authentically. We create large-scale, in-depth maps of social media landscapes and conversations to help businesses discover how communities form online and how influence and information flows within large scale networks. Founded on the pioneering work of social and network science scholars, Graphika uses deep learning to identify the people, connections, messages, and conversations online that matter for customers from Facebook and Twitter, to T-Systems and ABinBev. Our interdisciplinary team uses our unique, patented set of technologies and tools to create and apply new, rigorous analytical methods that enable customers to engage in online communities meaningfully to grasp opportunities or mitigate risk online. Our SaaS platform enables customers to scale these abilities across various use cases, from crisis communications and proactive narrative monitoring, to tracking the spread of disinformation through audiences. Our work has informed the U.S. Senate Select Committee on Intelligence on election interference.

About the Role

Graphika seeks an experienced data engineer to join our technology team. The technology department at Graphika builds the tools that drive our cutting-edge analysis platform. We work with large scale network analysis algorithms (clustering, network decomposition, network visualization) and streaming data to tackle interesting questions in new ways. As a data engineer you will contribute to building and scaling our various near-real time data pipelines. The data engineer will also collaborate with various other members of the team including analysts, data scientists, other engineers, and the product team, to help plan and implement solutions. This job is not, however, an analyst or data scientist role that contributes directly to the highly publicized reports Graphika generates. This job ensures the robust, clean data on which those reports and further scientific discovery can be based with integrity.

Areas of Responsibility

  • Help create and optimize large-scale batch and real-time data pipelines that ingest large quantities of structured and unstructured data from a variety of sources
  • Actively own systems which support diverse applications across Graphika teams
  • Design and implement ETL processes through cloud-based solutions
  • Share ownership in ensuring the quality of our data and data infrastructure
  • Consistently test code and systems for robustness
  • Strategize around new data storage solutions and support existing ones

Ideal Candidate Profile

You have demonstrated the ability to build, deploy and maintain large-scale, data-driven solutions. You love to take on complex data-related problems, and can work independently. You have the skills and desire to interrogate data sets to understand their various issues, and respond accordingly. You have a working knowledge of computer science fundamentals like algorithms, data structures, and time complexity. You can imagine and design architectural solutions at scale. You think beyond the task at hand and try to understand the 'why' behind what you are doing. You can maintain a focus on delivering software products, understanding that done is often preferable to perfect.

You are an enthusiastic teammate, who engages in collaboration and proactive discussion. You are an effective communicator who can explain technical concepts to product leaders, customer support, and other engineers. You work with confidence and without ego. You have deep knowledge and exercise a high degree of ownership in your daily work. You have loosely-held, defensible ideas, and advocate for what you believe is right. You can surface your unarticulated assumptions. You are also adept at identifying and evaluating trade-offs, willing to be proven wrong, and quick to support your fellow teammates.



  • Experience in writing production quality software in Python that is understandable, testable, and has an eye towards maintainability
  • Familiarity with AWS services: S3, Lambda, Kinesis, SQS, or similar cloud-based tools
  • Knowledge of SQL and common relational database systems (PostgreSQL and MySQL)
  • Well-informed about data storage solutions
  • Familiarity with schema design for a variety of domains
  • Comfort with designing and scaling massive munging efforts on unstructured data
  • Knowledge of tradeoffs between different distributed systems architectures
  • Experience with the Python data science stack (numpy, pandas, matplotlib, sklearn, Jupyter)
  • Dedication to code quality, automation, and operational excellence (unit/integration tests, scripts, workflows)
  • Ability to work legally in the US without visa sponsorship

Nice to have:

  • Experience with implementing engineering systems to support ML / NLP algorithms
  • Experience with social media data sources and formats
  • Knowledge of and ability to interact with DevOps tooling (Terraform, Ansible, Packer, Docker)
  • Knowledge of NoSQL technologies like Redis
  • Experience with Apache Spark

All Graphika Tech Team Members...

  • Understand and appreciate good software engineering practices, including version control, code reviews, testing, and refactoring
  • Are comfortable debugging and optimizing code
  • Write tests to make sure code is reliable
  • Help shape technical decisions within the team
  • Collaborate within and across departments to ensure successful product creation
  • Have the ability to pick up new tools and technologies as needed

Education and Salary

Bachelor's degree or equivalent work experience

Salary commensurate with experience


  • Unlimited Paid Time Off (PTO), with a company-mandated minimum of 10 days of vacation time taken per year; There are also a minimum of 12 Graphika holidays
  • 100% healthcare (health, vision, dental) premium coverage for employees; 50% premium coverage for families
  • 401(k)
  • Access to Samata Health Therapy Platform, with one free session for all Graphikans
  • Remote personal office setup stipend + 20% of home internet costs covered

Location and Hours

Prior to COVID-19, Graphika's technology team was mostly co-located in our NYC office but continues to be fully-distributed today. As a result, we are happy to consider applicants who are located in the continental US, with the caveat that the technology team works on Eastern time and begins their day at around 10 AM. Daily Standup is at 10:30 AM EST. We are interested in both full- and part- time employees.

*** Mention DataYoshi when applying ***

Offers you may like...

  • Medidata Solutions

    Senior Data Engineer
  • Netrist

    Senior Data Engineer
  • Kaizen Technologies

    Data Engineer
    Edison, NJ 08820
  • Nav

    Staff Data Engineer - Remote within the US or In O...
    Olympia, WA 98501
  • EasyKnock

    Staff Data Engineer