# Research Engineer, Discovery
**Company:** [Anthropic](https://hotfix.jobs/companies/anthropic)
**Location:** San Francisco, CA
**Salary:** $350K-$850K
**Experience:** 6+ years
**Skills:** Kubernetes, Docker, PyTorch, JAX, AWS, GCP, Apache Beam, Spark, Dask, Distributed Systems
**Posted:** 2026-02-12
> Builds large-scale infrastructure for AI scientist training, evaluation, and deployment, resolving bottlenecks in distributed systems for scientific AGI. Requires 6+ years in infrastructure engineering with expertise in ML stacks, containers, and data pipelines.
## Job Description
## Responsibilities
- Design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments
- Identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities
- Develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI
- Build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows
- Collaborate to translate experimental requirements into production-ready infrastructure
- Develop large scale data pipelines to handle advanced language model training requirements
- Optimize large scale training and inference pipelines for stable and efficient reinforcement learning

## You may be a good fit if you
- Have 6+ years of highly-relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems
- Are a strong communicator and enjoy working collaboratively
- Possess deep knowledge of performance optimization techniques and system architectures for high-throughput ML workloads
- Have experience with containerization technologies (**Docker**, **Kubernetes**) and orchestration at scale
- Have proven track record of building large-scale data pipelines and distributed storage systems
- Excel at diagnosing and resolving complex infrastructure challenges in production environments
- Can work effectively across the full ML stack from data pipelines to performance optimization
- Have experience collaborating with other researchers to scale experimental ideas
- Thrive in fast-paced environments and can rapidly iterate from experimentation to production

## Strong candidates may also have
- Experience with language model training infrastructure and distributed ML frameworks (**PyTorch**, **JAX**, etc.)
- Background in building infrastructure for AI research labs or large-scale ML organizations
- Knowledge of GPU/TPU architectures and language model inference optimization
- Experience with cloud platforms (**AWS**, **GCP**) at enterprise scale
- Familiarity with VM and container orchestration
- Experience with workflow orchestration tools and experiment management systems
- History working with large scale reinforcement learning
- Comfort with large scale data pipelines (**Beam**, **Spark**, **Dask**)

**Annual Salary:** $350,000 — $850,000 USD

**Education requirements:** At least a Bachelor's degree in a related field or equivalent experience.
**Apply:** https://hotfix.jobs/jobs/research-engineer-discovery-at-anthropic-d733a39f-1f6b-4aa7-bc66-7a2a02745fa8
**Canonical:** https://hotfix.jobs/jobs/research-engineer-discovery-at-anthropic-d733a39f-1f6b-4aa7-bc66-7a2a02745fa8