Skip to content

Research Engineer, Codex

Advances AI coding models through research, experimentation, and system optimization on the Codex team. Collaborates to improve code generation, reasoning, and performance for real-world deployment.

295k – 445kSan Francisco, CAAI ResearchHybrid

About the role

Responsibilities

  • Design and run experiments to improve code generation, reasoning, and agentic behavior in Codex models.
  • Develop research insights into model training, alignment, and evaluation.
  • Hunt down and address inefficiencies across the Codex system stack—from agent behavior to LLM inference to container orchestration—and land high-leverage performance improvements.
  • Build tooling to measure, profile, and optimize system performance at scale.
  • Work across the stack to prototype new capabilities, debug complex issues, and ship improvements to production.

Requirements

  • Excited to explore and push the boundaries of large language models, especially in the domain of software reasoning and code generation.
  • Strong software engineering skills and enjoy quickly turning ideas into working prototypes.
  • Think holistically about performance, balancing speed, cost, and user experience.
  • Bring creativity and rigor to open-ended research problems and thrive in highly iterative, ambiguous environments.
  • Experience operating across both ML systems and cloud infrastructure.

Skills

LLMsCode GenerationMl SystemsCloud InfrastructureLlm InferenceContainer OrchestrationModel TrainingModel AlignmentModel EvaluationPython

Similar roles

AI Research jobs

Researcher, Misalignment Research

Designs worst-case demonstrations and adversarial evaluations to uncover AGI misalignment risks like deception and power-seeking. Builds automated stress-testing infrastructure and researches alignment failure modes to inform OpenAI's safety strategy. Requires 4+ years in AI red-teaming or adversarial ML.

295k – 445kSan Francisco, CAAI ResearchOn-site4+ YOELLMsAi Safety

Researcher, Loss of Control

Designs and implements mitigation stacks to prevent loss of control risks in frontier AI models, including prevention, monitoring, detection, and enforcement. Requires expertise in deep learning, transformers, PyTorch/TensorFlow, and AI safety research.

295k – 445kSan Francisco, CAAI ResearchOn-siteLLMsPyTorch

Researcher, Synthetic RL

Develops novel reinforcement learning techniques using synthetic environments and feedback to enhance large-scale AI models. Designs experiments, analyzes dynamics, and integrates research into production systems; requires strong RL/ML background and engineering skills.

295k – 445kSan Francisco, CAAI ResearchHybridPythonResearch

Research Engineer / Research Scientist, Post-Training

Research and develop improvements to pre-trained models for deployment in ChatGPT and API using reinforcement learning and product-driven approaches. Requires strong ML engineering, research experience with novel models, and ability to debug large codebases.

295k – 555kSan Francisco, CAAI ResearchHybridLLMsPython

Researcher, Pretraining Safety

Develop techniques to predict and mitigate unsafe behaviors in early-stage base models, design safer pretraining architectures, and integrate safety signals throughout training. Collaborate across safety teams to build robust, scalable safety foundations grounded in real-world risks.

295k – 445kSan Francisco, CAAI ResearchOn-siteJAXLLMs