Member of Technical Staff - Research

Conducts applied research on long-horizon autonomous AI agents, focusing on evaluation, post-training, environment design, and benchmarks to improve frontier models. Builds simulations, runs experiments, ships production code, and publishes findings.

200k – 350kSan Francisco, CAAI ResearchOnsite

Apply

About the role

Responsibilities

Advance the frontier of autonomous agents through core research in long-horizon evaluation, agent post-training, and environment design.
Understand where current models fail and how to improve them.
Build benchmarks, create environments, write production code, and run rigorous experiments.
Develop advanced environment simulation engines for training & evaluating autonomous AI agents.
Investigate failure modes of frontier models.
Create rigorous benchmarks for complex, realistic tasks requiring long-horizon reasoning and tool use in dynamic environments.
Post-training agents in complex simulation environments.
Publish research.

Requirements

Strong engineering & research fundamentals and prolific user of AI tools.
Experience post-training frontier models.
Experience shipping reliable, production-quality code.
Track record of publications.

Perks

Comprehensive health, dental, and vision insurance.
401(k).
Unlimited PTO.
Free meals with the team.
Wellness stipend & learning stipend.
Top of the line tech.
Frequent team activities and outings.

Skills

Reinforcement LearningAI AgentsSimulation EnvironmentsPost-TrainingFrontier ModelsLong-Horizon ReasoningBenchmarksPythonMachine LearningResearch Publication

Similar roles

AI Research jobs

Webflow

Senior Staff Machine Learning Scientist, Assets

Leads research in computer vision, multimodal understanding, and visual generation. Develops novel models and methodologies, translates research to production, and mentors teams. Requires PhD preferred, 8+ years experience, and expertise in PyTorch, TensorFlow, transformers.

194k – 285kUnited StatesAI ResearchRemote8+ YOEPythonPyTorch

Nuro

Senior/Staff Machine Learning Research Scientist: Generative Modeling for Planning

Develop and scale generative models like diffusion and flow-matching for autonomous driving plan generation. Collaborate across teams to productize models for real-world deployment, requiring PhD/MSc + 3+ years in generative modeling and strong Python/C++ skills.

194k – 352kMountain View, CAAI ResearchOn-siteC++LLMs

Databricks

Staff Research Engineer, Data Agents

Develop post-training recipes and build enterprise data agents for autonomous planning, code generation, and multi-step workflows. Requires 2+ years applied research experience shipping prototypes, plus expertise in LLMs, agents, and RL.

190k – 270kSan Francisco, CAAI ResearchOn-site2+ YOELLMsPython

Rad AI

Staff ML Research Scientist

Leads end-to-end applied ML research in NLP, LLMs, retrieval, and multimodal models for healthcare AI, driving from experimentation to production deployment with rigorous evaluation and clinician collaboration. Requires 7+ years experience, MS/PhD, and depth in ML areas like PyTorch tooling.

190k – 260kSan Francisco, CAAI ResearchOn-site7+ YOENLPRay

Scale AI

Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI

Develops next-gen Agent RL training platform for enterprise GenAI, integrating cutting-edge research to train state-of-the-art models for complex use cases. Requires 5+ years LLM production experience, RLHF expertise, recent top publications, and advanced CS degree.

218k – 273kSan Francisco, CA +2AI ResearchOn-site5+ YOEPpoRLHF