Skip to content

Staff Software Engineer, Developer Productivity

405k – 485kSan Francisco, CANew York, NYDevOps / SREHybrid7+ YOE
Summary

Staff-level IC role owning end-to-end CI/CD, merge queue, and deploy pipelines for Anthropic's engineering org. Focus on AI-assisted review, test reliability, and progressive delivery at monorepo scale.

About the role

Key responsibilities

  • Own the build, test, merge, and deploy pipeline end to end — what runs on each PR, what auto-approves, what gates merge, and how a change progresses to running healthy in production
  • Drive down and defend "time from push to healthy in prod" as a core engineering metric
  • Design and tune AI-assisted code review so confidence-to-land scales with PR volume
  • Build the deploy and release path — canary, progressive rollout, health checks, automated rollback — in partnership with the platform teams who own the underlying substrate
  • Improve test reliability by quarantining, root-causing, and retiring intermittent failures
  • Shape CI and repository topology (build graph, test targeting, scope boundaries) to match how the company actually ships
  • Partner with platform, delivery infrastructure, and security teams, and represent Developer Productivity in cross-org pipeline decisions
  • Design processes (postmortem review, incident response, on-call) that help the team operate reliably and never fail the same way twice

Minimum qualifications

  • Significant backend or developer-infrastructure engineering experience, with hands-on responsibility for a high-leverage CI/CD, merge queue, or land pipeline at scale
  • Proficiency in Python and at least one statically-typed systems language (e.g., Go or Rust)
  • Experience operating CI/CD or release systems through production incidents, including writing postmortems and driving remediations
  • Demonstrated ability to work across team boundaries — building consensus with platform, security, and product engineering stakeholders
  • Comfort using AI coding tools as a daily part of your workflow, with informed opinions on where they provide leverage

Preferred qualifications

  • 7+ years of backend or developer-infrastructure experience
  • Experience with Bazel or similar build-graph / test-targeting systems at monorepo scale
  • Experience with progressive delivery or release engineering at scale (canary analysis, automated rollback, health-gated promotion)
  • A track record of leading — or making the well-reasoned case against — a repo split, monorepo extraction, or comparable scope-boundary migration
  • A history of authoring engineering policy or paved-path tooling that other teams adopted voluntarily
  • Familiarity with Kubernetes, Buildkite, GitHub Actions, or comparable CI/deploy substrates
  • Interest in the safe and beneficial development of AI

Representative projects

  • Reducing p50 merge-to-production time by re-architecting the merge queue and test selection strategy
  • Building an AI-assisted review layer that auto-approves low-risk changes and routes high-risk ones to the right reviewers
  • Designing a flaky-test quarantine and burndown system that returned CI signal to >99% reliability
  • Standing up canary and progressive rollout for a service fleet, with automated rollback on health regression
  • Authoring the RFC and migration plan for a build-graph or repository topology change adopted across multiple teams
Skills
PythonGoRustBazelKubernetesBuildkiteGitHub ActionsCI/CDMerge QueueCanary Deployment
Similar roles at this salary range
All DevOps / SRE jobs →
Thinking Machines Lab

Reliability Engineer, Supercomputing

Ensure reliability of large GPU supercomputing clusters by diagnosing hardware/firmware/OS issues, automating monitoring, driving firmware rollouts, and working directly with vendors.

350k – 475kSan Francisco, CADevOps / SREOn-siteBMCRust
Thinking Machines Lab

Network Engineer, Supercomputing

Own and debug multi-thousand-GPU network fabric (RDMA/RoCE, NVLink/NVSwitch) for large-scale AI training and inference. Requires backend language proficiency, large-scale cluster experience, and cross-stack ownership.

350k – 475kSan Francisco, CADevOps / SREOn-siteRustRDMA
Anthropic

Staff Software Engineer, Developer Productivity

Staff-level engineer to own end-to-end development environments at Anthropic, focusing on container lifecycle, cold-start optimization, environment isolation, and pre-push validation for AI researchers and engineers.

405k – 485kSan Francisco, CA +1DevOps / SREHybrid7+ YOEGoNix
Anthropic

Staff Software Engineer, Node Infra

Own technical strategy and roadmap for node lifecycle management, health automation, and scaling AI clusters across clouds and accelerators. Requires deep distributed systems expertise, ML accelerator experience, and 12+ years leading complex multi-team infrastructure initiatives.

405k – 485kSan Francisco, CA +2DevOps / SREHybrid12+ YOEGoAWS
Anthropic

Staff Software Engineer, Kubernetes Platform

Senior-level engineer to own and scale Anthropic's massive Kubernetes control plane and scheduler for training frontier AI models across hundreds of thousands of nodes. Requires deep Kubernetes internals experience and 12+ years building production distributed systems.

405k – 485kSan Francisco, CA +2DevOps / SREHybrid12+ YOEGoC++