Research Engineer, Interpretability

Research Engineer focused on mechanistic interpretability, building tools and infrastructure to reverse-engineer neural networks for safer AI. Requires 5+ years software experience, Python proficiency, and AI research contributions.

315k – 560kSan Francisco, CAAI ResearchHybrid5+ YOE

Apply

About the role

Responsibilities

Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models
Set up and optimize research workflows to run efficiently and reliably at large scale
Build tools and abstractions to support rapid pace of research experimentation
Develop and improve tools and infrastructure to support other teams in using Interpretability’s work to improve model safety

You may be a good fit if you

Have 5-10+ years of experience building software
Are highly proficient in at least one programming language (e.g., Python, Rust, Go, Java) and productive with python
Have some experience contributing to empirical AI research projects
Have a strong ability to prioritize and direct effort toward the most impactful work and are comfortable operating with ambiguity and questioning assumptions
Prefer fast-moving collaborative projects to extensive solo efforts
Want to learn more about machine learning research and its applications and collaborate closely with researchers
Care about the societal impacts and ethics of your work

Strong candidates may also have experience with

Designing a code base so that anyone can quickly code experiments, launch them, and analyze their results without hitting bugs
Optimizing the performance of large-scale distributed systems
Collaborating closely with researchers
Language modeling with transformers
GPUs or Pytorch

Representative Projects

Building Garcon, a tool that allows researchers to easily access LLMs internals from a jupyter notebook
Setting up and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them
Profiling and optimizing ML training, including parallelizing to many GPUs
Make launching ML experiments and manipulating+analyzing the results fast and easy
Creating an interactive visualization of attention between tokens in a language model

Skills

PythonRustGoJavaPyTorchTransformersGpusJupyterDistributed SystemsMachine Learning

Similar roles

AI Research jobs

Anthropic

Research Engineer, Rule of Law

Conduct technical and sociotechnical research at the intersection of AI and democratic institutions, focusing on legal alignment, institutional analysis, and AI applications to support civic life and accountable government. Requires deep AI expertise plus substantive knowledge in law, government, or public policy.

320k – 485kSan Francisco, CA +1AI ResearchHybrid5+ YOEGovtechFine-Tuning

Anthropic

Research Scientist, Life Sciences

Anthropic is seeking a Research Scientist to join their Life Sciences team. This role involves building and shipping agentic tools, designing evaluation benchmarks, and partnering with external users to improve model capabilities on scientific tasks.

300k – 320kSan Francisco, CAAI ResearchHybrid5+ YOELLMsRLHF

Anthropic

Transformative AI Research Economist, Economic Research

Builds macroeconomic models and scenario-based forecasting tools for transformative AI impacts on growth, labor markets, and income distribution. Requires PhD in Economics, expertise in macro modeling, computational methods, and grounding in real-world AI usage data.

300k – 405kSan Francisco, CAAI ResearchHybridLLMsJulia

OpenAI

Researcher, Misalignment Research

Designs worst-case demonstrations and adversarial evaluations to uncover AGI misalignment risks like deception and power-seeking. Builds automated stress-testing infrastructure and researches alignment failure modes to inform OpenAI's safety strategy. Requires 4+ years in AI red-teaming or adversarial ML.

295k – 445kSan Francisco, CAAI ResearchOn-site4+ YOELLMsAi Safety

OpenAI

Researcher, Loss of Control

Designs and implements mitigation stacks to prevent loss of control risks in frontier AI models, including prevention, monitoring, detection, and enforcement. Requires expertise in deep learning, transformers, PyTorch/TensorFlow, and AI safety research.

295k – 445kSan Francisco, CAAI ResearchOn-siteLLMsPyTorch