Skip to content

Research Engineer, Frontier Evals & Environments

Builds ambitious RL environments and evaluation systems to measure and steer frontier AI models toward safe AGI. Requires strong ML research engineering, statistical skills, and red-teaming mindset for end-to-end project ownership in fast-paced setting.

205k – 380kSan Francisco, CAAI ResearchOnsite

About the role

Responsibilities

  • Create ambitious RL environments to push our models to their limits
  • Work on measuring frontier model capabilities, skills, and behaviors
  • Develop new methodologies for automatically exploring the behavior of these models
  • Help steer training for our largest training runs, and see the future first
  • Design scalable systems and processes to support continuous evaluation
  • Build self-improvement loops to automate model understanding

Requirements

  • Passionate and knowledgeable about AGI/ASI measurement
  • Strong engineering and statistical analysis skills
  • Able to think outside the box and have a robust “red-teaming mindset”
  • Experienced in ML research engineering, stochastic systems, observability and monitoring, LLM-enabled applications, and/or another technical domain applicable to AI evaluations
  • Able to operate effectively in a dynamic and extremely fast-paced research environment as well as scope and deliver projects end-to-end

Nice-to-haves

  • First-hand experience in red-teaming systems—be it computer systems or otherwise
  • An ability to work cross-functionally
  • Excellent communication skills

Skills

Reinforcement LearningMachine LearningLLMsStatistical AnalysisRed-TeamingObservabilityMonitoringStochastic SystemsRl EnvironmentsModel Evaluation

Similar roles

AI Research jobs

Machine Learning Scientist (All Levels)

Conducts machine learning research in medical NLP for conversation summarization, evidence extraction, and outcome prediction. Publishes at top AI conferences, deploys models to production, and requires MS/PhD plus strong PyTorch/TensorFlow experience.

205k – 300kSan Francisco, CA +2AI ResearchHybridJAXPyTorch

Research Scientist

Leads original research in action-conditioned world models, physical AI, and generative modeling for embodied systems. Requires PhD in ML/CS/Robotics with top publications and expertise in generative models and large-scale training.

200k – 325kSan Francisco, CAAI ResearchOn-siteDpoRLHF

Post-Training Research Scientist

Conducts research on post-training methodologies and performant inference for AI models, balancing pure research with applied work for production systems. Requires PhD in ML with top publications and ability to design rigorous experiments at scale.

210k – 285kSan Francisco, CAAI ResearchHybridJAXLLMs

AI Researcher, Core ML (Turbo)

Develops efficient inference engines and RL/post-training pipelines for production-scale LLMs, optimizing algorithms, systems, and performance across the stack. Requires 3+ years in ML systems/RL/inference and advanced degree.

200k – 280kSan Francisco, CAAI ResearchOn-site3+ YOEDpovLLM

Founding ML Researcher

Founding ML Researcher shapes ML research direction for document AI, owns end-to-end lifecycle from research to production deployment. Requires expertise in VLMs, computer vision, unstructured data parsing; PhD preferred.

200k – 300kSan Francisco, CAAI ResearchOn-sitePyTorchDocument Ai