Skip to content

Research Scientist, Frontier Risk Evaluations

Designs evaluation measures, harnesses, and datasets to assess risks from frontier AI systems, including dangerous capabilities testing. Collaborates with agencies, publishes methodologies for policymakers; requires 3+ years ML experience and publications in generative AI.

197k – 247kSan Francisco, CANew York, NYSeattle, WAAI ResearchOnsite3+ YOE

About the role

Responsibilities

  • Design and build harnesses to test AI models and systems (including agents) for dangerous capabilities such as security vulnerability exploitation, CBRN uplift, and other high-risk activities.
  • Work with government agencies or other labs to collectively scope and design evaluations to measure and mitigate risks posed by advanced AI systems.
  • Publish evaluation methodologies and write technical reports for policymakers.

Requirements

  • Commitment to promoting safe, secure, and trustworthy AI deployments.
  • Practical experience conducting technical research collaboratively, including building and instrumenting ML pipelines, writing evaluation harnesses, and prototyping ideas from research literature.
  • Track record of published research in machine learning, particularly generative AI.
  • At least three years of experience addressing sophisticated ML problems in research or product development.
  • Strong written and verbal communication skills for cross-functional teams.

Nice to Have

  • Experience crafting evaluations and benchmarks, or background in data science roles related to LLM technologies.
  • Experience with red-teaming or adversarial testing of AI systems.
  • Familiarity with AI safety policy frameworks (e.g., NIST AI RMF, EU AI Act, Korea AI Basic Act).

Skills

Machine LearningGenerative AILLMsMl PipelinesEvaluation HarnessesRed-TeamingAdversarial TestingAi SafetyBenchmarksPrototyping

Similar roles

AI Research jobs

Research Scientist, Agent Robustness

Research Scientist focuses on agent robustness, developing tests, exploits, and mitigations for safe AI agents. Requires 3+ years ML experience, RL techniques like RLHF/DPO, and published research in generative AI.

197k – 247kSan Francisco, CA +1AI ResearchHybrid3+ YOEDpoRLHF

Research Scientist, AI Controls and Monitoring

Designs methods, systems, and experiments for AI controls and monitoring to ensure alignment in high-stakes environments, including real-time tracking, fail-safes, and red-team simulations. Requires 3+ years ML experience, published research in generative AI, and strong prototyping skills.

197k – 247kSan Francisco, CA +1AI ResearchHybrid3+ YOEDpoRLHF

Lead Quantum Device Theorist

Leads theoretical modeling of superconducting quantum processors, focusing on noise sources, gate operations, and error correction to enhance qubit performance. Requires PhD in Physics or related field with 5+ years experience in circuit QED and quantum simulations.

195k – 225kBerkeley, CA +1AI ResearchOn-site5+ YOEStimQutip

Research Scientist

Leads original research in action-conditioned world models, physical AI, and generative modeling for embodied systems. Requires PhD in ML/CS/Robotics with top publications and expertise in generative models and large-scale training.

200k – 325kSan Francisco, CAAI ResearchOn-siteDpoRLHF

AI Researcher, Core ML (Turbo)

Develops efficient inference engines and RL/post-training pipelines for production-scale LLMs, optimizing algorithms, systems, and performance across the stack. Requires 3+ years in ML systems/RL/inference and advanced degree.

200k – 280kSan Francisco, CAAI ResearchOn-site3+ YOEDpovLLM