Skip to content

Research Engineer, Discovery

Builds large-scale infrastructure for AI scientist training, evaluation, and deployment, resolving bottlenecks in distributed systems for scientific AGI. Requires 6+ years in infrastructure engineering with expertise in ML stacks, containers, and data pipelines.

350k – 850kSan Francisco, CAAI ResearchHybrid6+ YOE

About the role

Responsibilities

  • Design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments
  • Identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities
  • Develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI
  • Build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows
  • Collaborate to translate experimental requirements into production-ready infrastructure
  • Develop large scale data pipelines to handle advanced language model training requirements
  • Optimize large scale training and inference pipelines for stable and efficient reinforcement learning

You may be a good fit if you

  • Have 6+ years of highly-relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems
  • Are a strong communicator and enjoy working collaboratively
  • Possess deep knowledge of performance optimization techniques and system architectures for high-throughput ML workloads
  • Have experience with containerization technologies (Docker, Kubernetes) and orchestration at scale
  • Have proven track record of building large-scale data pipelines and distributed storage systems
  • Excel at diagnosing and resolving complex infrastructure challenges in production environments
  • Can work effectively across the full ML stack from data pipelines to performance optimization
  • Have experience collaborating with other researchers to scale experimental ideas
  • Thrive in fast-paced environments and can rapidly iterate from experimentation to production

Strong candidates may also have

  • Experience with language model training infrastructure and distributed ML frameworks (PyTorch, JAX, etc.)
  • Background in building infrastructure for AI research labs or large-scale ML organizations
  • Knowledge of GPU/TPU architectures and language model inference optimization
  • Experience with cloud platforms (AWS, GCP) at enterprise scale
  • Familiarity with VM and container orchestration
  • Experience with workflow orchestration tools and experiment management systems
  • History working with large scale reinforcement learning
  • Comfort with large scale data pipelines (Beam, Spark, Dask)

Annual Salary: $350,000 — $850,000 USD

Education requirements: At least a Bachelor's degree in a related field or equivalent experience.

Skills

KubernetesDockerPyTorchJAXAWSGCPApache BeamSparkDaskDistributed Systems

Similar roles

AI Research jobs

Agent Post-Training, Frontier Evals and Environments Research

Researcher building frontier RL environments, evaluations, and training signals to steer OpenAI's largest agent training runs and measure model capabilities.

295k – 445kSan Francisco, CAAI ResearchOn-site7+ YOELLMsRLHF

Applied Research Scientist / Engineer

Work as a fullstack applied researcher adapting multimodal video foundation models for production. Focus on controllability, personalization, and end-user quality using SFT, RL, and data-driven refinement.

200k – 450kNew York, NY +1AI ResearchHybrid7+ YOERlSft

Senior Research Engineer, Voice + Speech

Lead development of models and algorithms for real-time voice agents, advancing speech understanding, naturalness, and production deployment in conversational AI. Requires 5+ years in AI/ML with experience deploying LLMs.

200k – 400kNew York, NYAI ResearchOn-site5+ YOELLMsPython

Senior Research Scientist

Leads end-to-end research initiatives in machine learning and large language models for conversational AI in housing and healthcare. Requires PhD plus 5+ years post-PhD experience, strong ML expertise, and Python proficiency.

200k – 320kSan Francisco, CAAI ResearchOn-site5+ YOERLLMs

Senior Research Scientist

Leads end-to-end research initiatives in machine learning and large language models for conversational AI in housing and healthcare. Requires PhD in relevant field plus 5+ years post-PhD experience, strong ML expertise, and Python proficiency.

200k – 320kNew York, NYAI ResearchOn-site5+ YOERLLMs