Research Engineer, Discovery

Builds large-scale infrastructure for AI scientist training, evaluation, and deployment, resolving bottlenecks in distributed systems for scientific AGI. Requires 6+ years in infrastructure engineering with expertise in ML stacks, containers, and data pipelines.

350k – 850kSan Francisco, CAAI ResearchHybrid6+ YOE

Apply

About the role

Responsibilities

Design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments
Identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities
Develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI
Build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows
Collaborate to translate experimental requirements into production-ready infrastructure
Develop large scale data pipelines to handle advanced language model training requirements
Optimize large scale training and inference pipelines for stable and efficient reinforcement learning

You may be a good fit if you

Have 6+ years of highly-relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems
Are a strong communicator and enjoy working collaboratively
Possess deep knowledge of performance optimization techniques and system architectures for high-throughput ML workloads
Have experience with containerization technologies (Docker, Kubernetes) and orchestration at scale
Have proven track record of building large-scale data pipelines and distributed storage systems
Excel at diagnosing and resolving complex infrastructure challenges in production environments
Can work effectively across the full ML stack from data pipelines to performance optimization
Have experience collaborating with other researchers to scale experimental ideas
Thrive in fast-paced environments and can rapidly iterate from experimentation to production

Strong candidates may also have

Experience with language model training infrastructure and distributed ML frameworks (PyTorch, JAX, etc.)
Background in building infrastructure for AI research labs or large-scale ML organizations
Knowledge of GPU/TPU architectures and language model inference optimization
Experience with cloud platforms (AWS, GCP) at enterprise scale
Familiarity with VM and container orchestration
Experience with workflow orchestration tools and experiment management systems
History working with large scale reinforcement learning
Comfort with large scale data pipelines (Beam, Spark, Dask)

Annual Salary: $350,000 — $850,000 USD

Education requirements: At least a Bachelor's degree in a related field or equivalent experience.

Skills

KubernetesDockerPyTorchJAXAWSGCPApache BeamSparkDaskDistributed Systems

Similar roles

AI Research jobs

OpenAI

Agent Post-Training, Frontier Evals and Environments Research

Researcher building frontier RL environments, evaluations, and training signals to steer OpenAI's largest agent training runs and measure model capabilities.

295k – 445kSan Francisco, CAAI ResearchOn-site7+ YOELLMsRLHF

Luma AI

Applied Research Scientist / Engineer

Work as a fullstack applied researcher adapting multimodal video foundation models for production. Focus on controllability, personalization, and end-user quality using SFT, RL, and data-driven refinement.

200k – 450kNew York, NY +1AI ResearchHybrid7+ YOERlSft

Decagon

Senior Research Engineer, Voice + Speech

Lead development of models and algorithms for real-time voice agents, advancing speech understanding, naturalness, and production deployment in conversational AI. Requires 5+ years in AI/ML with experience deploying LLMs.

200k – 400kNew York, NYAI ResearchOn-site5+ YOELLMsPython

EliseAI

Senior Research Scientist

Leads end-to-end research initiatives in machine learning and large language models for conversational AI in housing and healthcare. Requires PhD plus 5+ years post-PhD experience, strong ML expertise, and Python proficiency.

200k – 320kSan Francisco, CAAI ResearchOn-site5+ YOERLLMs

EliseAI

Senior Research Scientist

Leads end-to-end research initiatives in machine learning and large language models for conversational AI in housing and healthcare. Requires PhD in relevant field plus 5+ years post-PhD experience, strong ML expertise, and Python proficiency.

200k – 320kNew York, NYAI ResearchOn-site5+ YOERLLMs