Skip to content

Software Engineer, Data Infrastructure

Architect and build foundational data infrastructure for massive simulation outputs. Design novel data models and high-throughput pipelines to feed LLMs with structured context from complex, state-based environments.

186k – 233kNew York, NYWashington, DCData EngineeringOnsite5+ YOE

About the role

Key Responsibilities

  • Architect the Data Ensemble: Design and implement the architecture to ensemble various sources of injected context (deeply structural simulation data, historical game states, and dynamic user inputs) into a unified, highly queryable format optimized for LLM consumption.
  • Massive Batch Infrastructure: Build highly scalable, resilient data architectures from scratch. Optimize for moving, transforming, and processing massive quantities of simulation output data via enormous batch jobs, maintaining the minimal latency required for rapid wargame iterations.
  • Complex Data Modeling: Design sophisticated, highly relational data models that accurately represent massive, state-based simulation environments, making them easily interpretable by machine learning models.
  • First-Principles Problem Solving: Navigate highly ambiguous product requirements to design custom, ground-up systems where existing open-source or enterprise tools simply cannot handle the structural complexity or scale.
  • Technical Leadership: Set the technical standard for the data infrastructure team, driving rigorous code quality, system performance, and architectural clarity.

Requirements

  • 5+ years of backend or data infrastructure experience, operating at a Senior, Staff, or Principal level.
  • Deep, expert-level proficiency in systems languages (e.g., Rust, Go, C++, or highly optimized Python/Java, Spark) and a fundamental understanding of memory management, compute limits, and distributed systems architecture.
  • Proven track record of processing massive datasets. Understand how to optimize massive batch jobs and parallel processing across distributed simulation nodes without sacrificing speed.
  • Expert in surfacing the right needle from an ocean of hay to feed decision-making engines. Backgrounds in Search & RecSys, Gaming / MMOs, or High-Frequency Trading (HFT) highly valued.
  • Strong desire to build robust, foundational technology that supports national security and defense modernization.

Nice to Have

  • Active Secret or TS/SCI clearance, or eligibility and willingness to obtain one.
  • Experience with LLM context optimization, vector embeddings, or agentic AI frameworks (e.g., advanced RAG architectures).
  • Deep domain experience working with wargaming data, complex systems modeling, or distributed simulation protocols.
  • Previous experience in a high-growth, 0-to-1 startup environment.

Benefits

  • Comprehensive health, dental and vision coverage
  • Retirement benefits
  • Learning and development stipend
  • Generous PTO
  • Commuter stipend (role-dependent)

Skills

RustGoC++PythonJavaSparkDistributed SystemsData ModelingBatch ProcessingInformation Retrieval

Data Engineer

Builds and optimizes scalable data pipelines, storage, and OLAP databases for ML training, analytics, and product features. Requires 5+ years in data engineering, proficiency in Python/SQL/cloud platforms, and distributed systems experience.

185k – 217kSan Francisco, CAData EngineeringHybrid5+ YOESQLGCP

Software Engineer, Data Infrastructure

Builds and operates scalable data infrastructure including compute fleets, storage systems, and streaming platforms to support OpenAI's AI products, research, and analytics. Requires 4+ years in data or infrastructure engineering with expertise in Spark, Kafka, and distributed systems.

185k – 385kSan Francisco, CAData EngineeringHybrid4+ YOESparkKafka

Data Engineer

Seeking the first Data Engineer to architect ETL pipelines, manage a central data lake, and drive data governance for an AI platform serving top financial institutions. Requires 5+ years of data engineering experience with Python, SQL, and big data tools.

190k – 250kNew York, NY +1Data EngineeringOn-site5+ YOESQLETL

Data Engineer

Own data and analytics end-to-end: architect internal systems, build metrics/dashboards, and translate customer and product signals into structured inputs for AI agents.

180k – 210kNew York, NYData EngineeringOn-siteSQLdbt

Data Engineer

Builds and scales internal data platform by designing data models, pipelines, and analytics infrastructure to transform raw product/business data into reliable datasets for company-wide decision-making. Partners with stakeholders across Product, Engineering, Finance, Marketing, and Sales.

180k – 250kSan Francisco, CA +1Data EngineeringHybridKafkaAirflow