Incident Response Manager - Product & Engineering
Leads incident response operations for product and engineering, serving as on-call commander to coordinate cross-functional teams, manage communications, and improve processes during high-stakes incidents. Requires 5+ years in incident management with technical depth in infrastructure and cloud systems.
Responsibilities
- Build the incident response management function, establishing the processes, tooling, and operational standards that define how we handle incidents at scale
- Serve as an on-call incident commander, driving coordinated response across technical and non-technical stakeholders during incidents of varying severity, including managing multiple active incidents simultaneously
- Engage the right people at the right time, with a strong sense of urgency, bringing order and direction to fast-moving, ambiguous situations
- Own incident communications end-to-end, from real-time internal coordination to external channels like status pages, direct customer outreach, and stakeholder updates, ensuring they reflect Anthropic's commitments to safety, transparency, and accuracy
- Participate in blameless incident reviews, contributing operational context and helping drive follow-through on critical remediations so the same class of incident does not recur
- Partner with engineering teams to develop and maintain incident response policies, procedures, and escalation frameworks that scale with Anthropic's growth
- Partner with engineering, product, security, legal, and go-to-market teams to continuously improve how the organization detects, responds to, and learns from incidents
You May Be a Good Fit If You
- Have 5+ years of experience in incident management, with direct experience managing technical product or infrastructure incidents (not exclusively security or trust and safety)
- Have built or significantly shaped an incident response program, ideally at a high-growth startup or in an environment where you had to create structure rather than inherit it
- Demonstrate a strong sense of ownership and urgency, with the ability to operate independently and make sound decisions under pressure without waiting for direction
- Are comfortable working in unprecedented situations where processes are still being defined and guidance may be incomplete or conflicting, leaving things better than you found them
- Have a track record of effective cross-functional collaboration, particularly with engineering, security, legal, communications, go-to-market, and executive leadership
- Bring a blameless, learning-oriented mindset to incident reviews, focused on systemic improvement rather than individual fault
- Have experience with cloud infrastructure incidents and enough technical depth across the stack to engage meaningfully with engineering teams during response, including comfort navigating distributed systems, monitoring tools, and logs
- Are analytically minded, with experience using data (incident metrics, queries, trend analysis) to inform decisions during response and to drive operational improvements over time
- Communicate clearly and calmly under pressure, both in real-time coordination and in post-incident written communications
- Thrive in high-volume, fast-paced environments and are energized by bringing operational discipline to complex, evolving situations
Annual Salary: $290,000—$365,000 USD
Principal Infrastructure Engineer
Principal Infrastructure Engineer building and operating secure cloud-native and edge platforms for military collaboration software. Requires 8+ years production infrastructure experience, deep Kubernetes expertise, and ability to obtain SECRET clearance.
Staff Engineer, Distributed Storage and HPC & AI Infrastructure
Design and operate multi-petabyte distributed storage systems for large-scale AI training and inference, integrating parallel filesystems and building Kubernetes-native storage platforms.
Director of Platform & Reliability Engineering
The Director of Platform & Reliability Engineering will lead an engineering organization responsible for secure, scalable, and highly reliable products. This role involves setting the vision for internal platforms, cloud infrastructure, developer enablement, and production operations.
Staff Software Engineer, Infrastructure Asset Systems
As a Staff Software Engineer, you will build and extend systems for tracking, governing, and reporting on infrastructure assets. This involves designing data models, workflow engines, and integrations with financial and procurement systems, ensuring compliance and auditability.