Skip to content

Researcher, Loss of Control

Designs and implements mitigation stacks to prevent loss of control risks in frontier AI models, including prevention, monitoring, detection, and enforcement. Requires expertise in deep learning, transformers, PyTorch/TensorFlow, and AI safety research.

295k – 445kSan Francisco, CAAI ResearchOnsite

About the role

In this role, you will:

  • Design and implement mitigation components for loss of control risk—spanning prevention, monitoring, detection, containment, and enforcement—under the guidance of senior technical and risk leadership.
  • Integrate safeguards across product and research surfaces in partnership with product, engineering, and research teams, helping ensure protections are consistent, low-latency, and resilient as usage and model autonomy increase.
  • Evaluate technical trade-offs within the loss of control domain (coverage, robustness, latency, model utility, and operational complexity) and propose pragmatic, testable solutions.
  • Collaborate closely with risk modeling, evaluations, and policy partners to align mitigation design with anticipated failure modes and high-severity threat scenarios, including deceptive alignment, hidden subgoals, reward hacking, and attempts to evade oversight.
  • Execute rigorous testing and red-teaming workflows, helping stress-test the mitigation stack against increasingly capable and potentially subversive model behaviors—such as sandbagging, monitor evasion, exploit-seeking, unsafe tool use, or strategic deception—and iterate based on findings.

You might thrive in this role if you:

  • Have a passion for AI safety and are motivated to make cutting-edge AI models safer for real-world use.
  • Bring demonstrated experience in deep learning and transformer models.
  • Are proficient with frameworks such as PyTorch or TensorFlow.
  • Possess a strong foundation in data structures, algorithms, and software engineering principles.
  • Are familiar with methods for training and fine-tuning large language models, including distillation, supervised fine-tuning, and policy optimization.
  • Excel at working collaboratively with cross-functional teams across research, policy, product, and engineering.
  • Have significant experience designing and evaluating technical safeguards, control mechanisms, or monitoring systems for advanced AI behavior.
  • (Nice to have) Bring background knowledge in alignment, control, interpretability, robustness, adversarial ML, or related fields.

Skills

PyTorchTensorFlowDeep LearningTransformer ModelsLLMsSupervised Fine-TuningPolicy OptimizationData StructuresAlgorithmsSoftware Engineering

Similar roles

AI Research jobs

Researcher, Misalignment Research

Designs worst-case demonstrations and adversarial evaluations to uncover AGI misalignment risks like deception and power-seeking. Builds automated stress-testing infrastructure and researches alignment failure modes to inform OpenAI's safety strategy. Requires 4+ years in AI red-teaming or adversarial ML.

295k – 445kSan Francisco, CAAI ResearchOn-site4+ YOELLMsAi Safety

Researcher, Synthetic RL

Develops novel reinforcement learning techniques using synthetic environments and feedback to enhance large-scale AI models. Designs experiments, analyzes dynamics, and integrates research into production systems; requires strong RL/ML background and engineering skills.

295k – 445kSan Francisco, CAAI ResearchHybridPythonResearch

Research Engineer / Research Scientist, Post-Training

Research and develop improvements to pre-trained models for deployment in ChatGPT and API using reinforcement learning and product-driven approaches. Requires strong ML engineering, research experience with novel models, and ability to debug large codebases.

295k – 555kSan Francisco, CAAI ResearchHybridLLMsPython

Researcher, Pretraining Safety

Develop techniques to predict and mitigate unsafe behaviors in early-stage base models, design safer pretraining architectures, and integrate safety signals throughout training. Collaborate across safety teams to build robust, scalable safety foundations grounded in real-world risks.

295k – 445kSan Francisco, CAAI ResearchOn-siteJAXLLMs

Research Engineer, Codex

Advances AI coding models through research, experimentation, and system optimization on the Codex team. Collaborates to improve code generation, reasoning, and performance for real-world deployment.

295k – 445kSan Francisco, CAAI ResearchHybridLLMsPython