Research Engineer, Frontier Evals & Environments

Builds ambitious RL environments and evaluation systems to measure and steer frontier AI models toward safe AGI. Requires strong ML research engineering, statistical skills, and red-teaming mindset for end-to-end project ownership in fast-paced setting.

205k – 380kSan Francisco, CAAI ResearchOnsite

Apply

About the role

Responsibilities

Create ambitious RL environments to push our models to their limits
Work on measuring frontier model capabilities, skills, and behaviors
Develop new methodologies for automatically exploring the behavior of these models
Help steer training for our largest training runs, and see the future first
Design scalable systems and processes to support continuous evaluation
Build self-improvement loops to automate model understanding

Requirements

Passionate and knowledgeable about AGI/ASI measurement
Strong engineering and statistical analysis skills
Able to think outside the box and have a robust “red-teaming mindset”
Experienced in ML research engineering, stochastic systems, observability and monitoring, LLM-enabled applications, and/or another technical domain applicable to AI evaluations
Able to operate effectively in a dynamic and extremely fast-paced research environment as well as scope and deliver projects end-to-end

Nice-to-haves

First-hand experience in red-teaming systems—be it computer systems or otherwise
An ability to work cross-functionally
Excellent communication skills

Skills

Reinforcement LearningMachine LearningLLMsStatistical AnalysisRed-TeamingObservabilityMonitoringStochastic SystemsRl EnvironmentsModel Evaluation

Similar roles

AI Research jobs

Abridge

Machine Learning Scientist (All Levels)

Conducts machine learning research in medical NLP for conversation summarization, evidence extraction, and outcome prediction. Publishes at top AI conferences, deploys models to production, and requires MS/PhD plus strong PyTorch/TensorFlow experience.

205k – 300kSan Francisco, CA +2AI ResearchHybridJAXPyTorch

Hedra

Research Scientist

Leads original research in action-conditioned world models, physical AI, and generative modeling for embodied systems. Requires PhD in ML/CS/Robotics with top publications and expertise in generative models and large-scale training.

200k – 325kSan Francisco, CAAI ResearchOn-siteDpoRLHF

Baseten

Post-Training Research Scientist

Conducts research on post-training methodologies and performant inference for AI models, balancing pure research with applied work for production systems. Requires PhD in ML with top publications and ability to design rigorous experiments at scale.

210k – 285kSan Francisco, CAAI ResearchHybridJAXLLMs

Together AI

AI Researcher, Core ML (Turbo)

Develops efficient inference engines and RL/post-training pipelines for production-scale LLMs, optimizing algorithms, systems, and performance across the stack. Requires 3+ years in ML systems/RL/inference and advanced degree.

200k – 280kSan Francisco, CAAI ResearchOn-site3+ YOEDpovLLM

Unsiloed AI

Founding ML Researcher

Founding ML Researcher shapes ML research direction for document AI, owns end-to-end lifecycle from research to production deployment. Requires expertise in VLMs, computer vision, unstructured data parsing; PhD preferred.

200k – 300kSan Francisco, CAAI ResearchOn-sitePyTorchDocument Ai