Anthropic DevOps / SRE Jobs
Open devops / sre roles at Anthropic, pulled live from their hiring system.
View devops / sre jobs across all companies
72% of open devops / sre roles call out Python; Go and Kubernetes appear in roughly a third. Most of these devops / sre roles are on-site or hybrid; 0% are fully remote.
Staff Software Engineer, Infrastructure Asset Systems
As a Staff Software Engineer, you will build and extend systems for tracking, governing, and reporting on infrastructure assets. This involves designing data models, workflow engines, and integrations with financial and procurement systems, ensuring compliance and auditability.
Performance Engineer, Inference Systems
Performance engineer focused on cross-layer investigations of Anthropic's inference fleet for Claude, optimizing throughput, latency, reliability, and correctness while building observability and partnering with kernel and serving teams.
Staff Fiber Network Engineer
Owns end-to-end physical layer of private global dark-fiber backbone network, including route design, fiber acquisition, vendor management, acceptance testing, and lifecycle management. Requires deep OSP/fiber expertise, optical transport knowledge, and 8+ years experience building fiber programs.
Software Engineer, Systems - Claude Code
Optimizes performance and reliability of Claude Code and Bun JavaScript runtime through low-level systems programming. Requires deep expertise in C/C++/Rust, syscalls, memory management, and runtime internals with 5+ years experience.
Network Engineer, Capacity and Efficiency
Network engineer focused on observability, telemetry, cost modeling, and efficiency optimization for large-scale AI infrastructure networks across data centers, backbones, and cloud providers. Requires 5+ years experience with production networking, BGP/EVPN/QoS fluency, and Python/Go tooling.
Staff Software Engineer, Kubernetes Platform
Staff engineer scales massive Kubernetes clusters for AI model training, owning scheduler, control plane, and core services for reliability at extreme scale. Requires deep Kubernetes expertise, systems programming, and 8+ years experience.
Staff+ Software Engineer, Developer Productivity
Leads technical strategy and builds scalable developer infrastructure including build systems, CI/CD pipelines, and tooling for large monorepo environments. Requires 3+ years leading complex projects, proficiency in Python/Rust/Go, and experience with container orchestration.
Staff Infrastructure Engineer, Cluster Infrastructure
Leads technical strategy for agent-driven cluster lifecycle management, provisioning, and scalability across cloud providers and datacenters. Requires deep expertise in distributed systems, Kubernetes, IaC tools like Terraform, and systems languages like Rust/Go/Python; 8+ years experience preferred.
Incident Response Manager - Product & Engineering
Leads incident response operations for product and engineering, serving as on-call commander to coordinate cross-functional teams, manage communications, and improve processes during high-stakes incidents. Requires 5+ years in incident management with technical depth in infrastructure and cloud systems.
Staff Software Engineer, Node Infra
Leads technical strategy for node lifecycle management, scales AI clusters across clouds, and builds automated hardware health systems for large-scale accelerator fleets. Requires deep expertise in distributed systems, cloud platforms, systems programming, and ML accelerators.
Staff Engineer, Datacenter Server Lifecycle
Owns end-to-end server lifecycle in datacenters at scale, from provisioning to decommissioning, with strong focus on automation, trusted compute security, and hardware operations for AI workloads. Requires hands-on server hardware experience and proficiency in Python/Rust/Go plus cloud infra like Kubernetes/AWS/GCP.
Research Engineer, RL Infrastructure and Reliability (Knowledge Work)
Owns reliability, observability, and infrastructure for Knowledge Work team's RL training environments and evaluations. Ensures stability at scale through proactive hardening, SLOs, load testing, and incident response for ML systems.
Field Services Engineer
Deploys and maintains network infrastructure across US sites, oversees fiber optic vendors, manages RMAs and spares, and builds scalable processes. Requires 3+ years in field engineering or network ops, fiber knowledge, and 40-60% travel.
Staff+ Software Engineer, Platform
Staff-level software engineer builds and scales platform infrastructure across teams, including dev tools, service infra, multicloud, auth, connectivity, API distributability, and ML adaptation systems. Requires 8+ years full-stack experience with Staff leadership, focusing on robust, scalable solutions in fast-paced AI environment.
IT Systems Engineer, Enterprise SaaS
Designs and owns architecture for enterprise SaaS platforms like Google Workspace and Slack, builds API integrations and automations, and ensures security and scalability. Requires 8+ years experience with deep hands-on SaaS admin and coding against APIs.
Finance Systems Integration Engineer
Designs and builds integrations between ERP and financial apps like Salesforce, Brex using iPaaS tools (Workato, MuleSoft). Supports ERP implementation, develops AI agents with Claude API, and maintains data pipelines with Airflow/dbt for 8+ years experienced engineers.
Performance Engineer
Performance Engineer optimizes throughput and robustness of large-scale ML distributed systems by solving novel performance issues. Requires significant software engineering experience at supercomputing scale and interest in ML.