Staff Software Engineer, Developer Productivity

405k – 485kSan Francisco, CANew York, NYDevOps / SREHybrid7+ YOEJun 19

Summary

Staff-level IC role owning end-to-end CI/CD, merge queue, and deploy pipelines for Anthropic's engineering org. Focus on AI-assisted review, test reliability, and progressive delivery at monorepo scale.

About the role

Key responsibilities

Own the build, test, merge, and deploy pipeline end to end — what runs on each PR, what auto-approves, what gates merge, and how a change progresses to running healthy in production
Drive down and defend "time from push to healthy in prod" as a core engineering metric
Design and tune AI-assisted code review so confidence-to-land scales with PR volume
Build the deploy and release path — canary, progressive rollout, health checks, automated rollback — in partnership with the platform teams who own the underlying substrate
Improve test reliability by quarantining, root-causing, and retiring intermittent failures
Shape CI and repository topology (build graph, test targeting, scope boundaries) to match how the company actually ships
Partner with platform, delivery infrastructure, and security teams, and represent Developer Productivity in cross-org pipeline decisions
Design processes (postmortem review, incident response, on-call) that help the team operate reliably and never fail the same way twice

Minimum qualifications

Significant backend or developer-infrastructure engineering experience, with hands-on responsibility for a high-leverage CI/CD, merge queue, or land pipeline at scale
Proficiency in Python and at least one statically-typed systems language (e.g., Go or Rust)
Experience operating CI/CD or release systems through production incidents, including writing postmortems and driving remediations
Demonstrated ability to work across team boundaries — building consensus with platform, security, and product engineering stakeholders
Comfort using AI coding tools as a daily part of your workflow, with informed opinions on where they provide leverage

Preferred qualifications

7+ years of backend or developer-infrastructure experience
Experience with Bazel or similar build-graph / test-targeting systems at monorepo scale
Experience with progressive delivery or release engineering at scale (canary analysis, automated rollback, health-gated promotion)
A track record of leading — or making the well-reasoned case against — a repo split, monorepo extraction, or comparable scope-boundary migration
A history of authoring engineering policy or paved-path tooling that other teams adopted voluntarily
Familiarity with Kubernetes, Buildkite, GitHub Actions, or comparable CI/deploy substrates
Interest in the safe and beneficial development of AI

Representative projects

Reducing p50 merge-to-production time by re-architecting the merge queue and test selection strategy
Building an AI-assisted review layer that auto-approves low-risk changes and routes high-risk ones to the right reviewers
Designing a flaky-test quarantine and burndown system that returned CI signal to >99% reliability
Standing up canary and progressive rollout for a service fleet, with automated rollback on health regression
Authoring the RFC and migration plan for a build-graph or repository topology change adopted across multiple teams

Skills

PythonGoRustBazelKubernetesBuildkiteGitHub ActionsCI/CDMerge QueueCanary Deployment

Similar roles at this salary range

All DevOps / SRE jobs →

Thinking Machines Lab

Jun 24

Reliability Engineer, Supercomputing

Ensure reliability of large GPU supercomputing clusters by diagnosing hardware/firmware/OS issues, automating monitoring, driving firmware rollouts, and working directly with vendors.

350k – 475kSan Francisco, CADevOps / SREOn-siteBMCRust

Thinking Machines Lab

Jun 24

Network Engineer, Supercomputing

Own and debug multi-thousand-GPU network fabric (RDMA/RoCE, NVLink/NVSwitch) for large-scale AI training and inference. Requires backend language proficiency, large-scale cluster experience, and cross-stack ownership.

350k – 475kSan Francisco, CADevOps / SREOn-siteRustRDMA

Anthropic

Jun 19

Staff Software Engineer, Developer Productivity

Staff-level engineer to own end-to-end development environments at Anthropic, focusing on container lifecycle, cold-start optimization, environment isolation, and pre-push validation for AI researchers and engineers.

405k – 485kSan Francisco, CA +1DevOps / SREHybrid7+ YOEGoNix

Anthropic

Jun 17

Staff Software Engineer, Node Infra

Own technical strategy and roadmap for node lifecycle management, health automation, and scaling AI clusters across clouds and accelerators. Requires deep distributed systems expertise, ML accelerator experience, and 12+ years leading complex multi-team infrastructure initiatives.

405k – 485kSan Francisco, CA +2DevOps / SREHybrid12+ YOEGoAWS

Anthropic

Jun 17

Staff Software Engineer, Kubernetes Platform

Senior-level engineer to own and scale Anthropic's massive Kubernetes control plane and scheduler for training frontier AI models across hundreds of thousands of nodes. Requires deep Kubernetes internals experience and 12+ years building production distributed systems.

405k – 485kSan Francisco, CA +2DevOps / SREHybrid12+ YOEGoC++

Apply