Skip to content

Researcher, Misalignment Research

Designs worst-case demonstrations and adversarial evaluations to uncover AGI misalignment risks like deception and power-seeking. Builds automated stress-testing infrastructure and researches alignment failure modes to inform OpenAI's safety strategy. Requires 4+ years in AI red-teaming or adversarial ML.

295k – 445kSan Francisco, CAAI ResearchOnsite4+ YOE

About the role

Responsibilities

  • Design and implement worst-case demonstrations that make AGI alignment risks concrete for stakeholders, focused on high stakes use cases.
  • Develop adversarial and system-level evaluations grounded in those demonstrations, driving adoption across OpenAI.
  • Create automated tools and infrastructure to scale automated red-teaming and stress testing.
  • Conduct research on failure modes of alignment techniques and propose improvements.
  • Publish influential internal or external papers that shift safety strategy or industry practice.
  • Partner with engineering, research, policy, and legal teams to integrate findings into product safeguards and governance processes.
  • Mentor engineers and researchers, fostering a culture of rigorous, impact-oriented safety work.

Requirements

  • Passionate about red-teaming and AI safety, thinking about these problems constantly and aligning with OpenAI's mission.
  • 4+ years of experience in AI red-teaming, security research, adversarial ML, or related safety fields.
  • Strong research track record—publications, open-source projects, or high-impact internal work—demonstrating creativity in uncovering and exploiting system weaknesses.
  • Fluent in modern ML / AI techniques and comfortable hacking on large-scale codebases and evaluation infrastructure.
  • Communicate clearly with both technical and non-technical audiences, translating complex findings into actionable recommendations.
  • Enjoy collaboration and can drive cross-functional projects that span research, engineering, and policy.

Nice-to-Haves

  • Ph.D., master’s degree, or equivalent experience in computer science, machine learning, security, or a related discipline.

Skills

Ai SafetyRed-TeamingAdversarial MlMachine LearningEvaluation InfrastructureLLMsDeceptive AlignmentScheming DetectionPower-SeekingMl Security

Similar roles

AI Research jobs

Researcher, Loss of Control

Designs and implements mitigation stacks to prevent loss of control risks in frontier AI models, including prevention, monitoring, detection, and enforcement. Requires expertise in deep learning, transformers, PyTorch/TensorFlow, and AI safety research.

295k – 445kSan Francisco, CAAI ResearchOn-siteLLMsPyTorch

Researcher, Synthetic RL

Develops novel reinforcement learning techniques using synthetic environments and feedback to enhance large-scale AI models. Designs experiments, analyzes dynamics, and integrates research into production systems; requires strong RL/ML background and engineering skills.

295k – 445kSan Francisco, CAAI ResearchHybridPythonResearch

Research Engineer / Research Scientist, Post-Training

Research and develop improvements to pre-trained models for deployment in ChatGPT and API using reinforcement learning and product-driven approaches. Requires strong ML engineering, research experience with novel models, and ability to debug large codebases.

295k – 555kSan Francisco, CAAI ResearchHybridLLMsPython

Researcher, Pretraining Safety

Develop techniques to predict and mitigate unsafe behaviors in early-stage base models, design safer pretraining architectures, and integrate safety signals throughout training. Collaborate across safety teams to build robust, scalable safety foundations grounded in real-world risks.

295k – 445kSan Francisco, CAAI ResearchOn-siteJAXLLMs

Research Engineer, Codex

Advances AI coding models through research, experimentation, and system optimization on the Codex team. Collaborates to improve code generation, reasoning, and performance for real-world deployment.

295k – 445kSan Francisco, CAAI ResearchHybridLLMsPython