# Software Engineer, Data Infrastructure
**Company:** [Scale AI](https://hotfix.jobs/companies/scale-ai)
**Location:** New York, NY, Washington, DC
**Salary:** $186K-$233K
**Experience:** 5+ years
**Skills:** Rust, Go, C++, Python, Java, Spark, Distributed Systems, Data Modeling, Batch Processing, Information Retrieval
**Posted:** 2026-06-22
> Architect and build foundational data infrastructure for massive simulation outputs. Design novel data models and high-throughput pipelines to feed LLMs with structured context from complex, state-based environments.
## Job Description
## Key Responsibilities
- Architect the Data Ensemble: Design and implement the architecture to ensemble various sources of injected context (deeply structural simulation data, historical game states, and dynamic user inputs) into a unified, highly queryable format optimized for LLM consumption.
- Massive Batch Infrastructure: Build highly scalable, resilient data architectures from scratch. Optimize for moving, transforming, and processing massive quantities of simulation output data via enormous batch jobs, maintaining the minimal latency required for rapid wargame iterations.
- Complex Data Modeling: Design sophisticated, highly relational data models that accurately represent massive, state-based simulation environments, making them easily interpretable by machine learning models.
- First-Principles Problem Solving: Navigate highly ambiguous product requirements to design custom, ground-up systems where existing open-source or enterprise tools simply cannot handle the structural complexity or scale.
- Technical Leadership: Set the technical standard for the data infrastructure team, driving rigorous code quality, system performance, and architectural clarity.

## Requirements
- 5+ years of backend or data infrastructure experience, operating at a Senior, Staff, or Principal level.
- Deep, expert-level proficiency in systems languages (e.g., Rust, Go, C++, or highly optimized Python/Java, Spark) and a fundamental understanding of memory management, compute limits, and distributed systems architecture.
- Proven track record of processing massive datasets. Understand how to optimize massive batch jobs and parallel processing across distributed simulation nodes without sacrificing speed.
- Expert in surfacing the right needle from an ocean of hay to feed decision-making engines. Backgrounds in Search & RecSys, Gaming / MMOs, or High-Frequency Trading (HFT) highly valued.
- Strong desire to build robust, foundational technology that supports national security and defense modernization.

## Nice to Have
- Active Secret or TS/SCI clearance, or eligibility and willingness to obtain one.
- Experience with LLM context optimization, vector embeddings, or agentic AI frameworks (e.g., advanced RAG architectures).
- Deep domain experience working with wargaming data, complex systems modeling, or distributed simulation protocols.
- Previous experience in a high-growth, 0-to-1 startup environment.

## Benefits
- Comprehensive health, dental and vision coverage
- Retirement benefits
- Learning and development stipend
- Generous PTO
- Commuter stipend (role-dependent)
**Apply:** https://hotfix.jobs/jobs/software-engineer-data-infrastructure-at-scale-ai-753367fe-43ca-4551-898b-77247ddd27d8
**Canonical:** https://hotfix.jobs/jobs/software-engineer-data-infrastructure-at-scale-ai-753367fe-43ca-4551-898b-77247ddd27d8