Skip to content

Staff Site Reliability Engineer

Staff SRE on the IT Operations team owning reliability, automation, and observability for Coinbase's AI infrastructure on AWS and Kubernetes. Requires 8+ years of cloud infrastructure experience and strong incident response leadership.

218k – 257kUnited StatesDevOps / SRERemote8+ YOE

About the role

Responsibilities

  • Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
  • Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
  • Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
  • Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation that sets a standard of excellence.
  • Develop full-stack applications that power internal AI products and infrastructure with Go or Python.

Requirements

  • 8+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
  • Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production environments.
  • Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and version control workflows using Git-based CI/CD pipelines.
  • Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.
  • Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.

Nice to Haves

  • Expertise with linux, bash, ruby, python and/or go
  • Expertise automating EC2 or containers deployment with terraform
  • Strong network security fundamentals
  • Experience managing and leveraging log aggregation
  • Experience working in a highly regulated environment
  • Experience in a fast-paced, high-growth company
  • Experience in a Remote-first IT environment

Skills

AWSTerraformAnsibleChefPuppetSaltDockerKubernetesPythonGoBashRubyGitCI/CD

Similar roles

DevOps / SRE jobs

Staff Software Engineer, Core Reliability

Staff engineer on the Infra Reliability team improving system resiliency, deployment safety, and configuration management for Coinbase's production environment at massive scale.

218k – 257kUnited StatesDevOps / SRERemote7+ YOEGoAWS

Staff Software Engineer

Staff Software Engineer owning technical strategy and systems for Coinbase's test infrastructure at scale. Focus on fast, reliable test signals through orchestration, smart selection, sharding, and flakiness remediation.

218k – 257kUnited StatesDevOps / SRERemote10+ YOEGoAWS

Staff Site Reliability Engineer

Leads infrastructure transformation from monoliths to scalable microservices at massive scale, architects observability/CI/CD systems, unifies complex stacks, and mentors engineers. Requires 10+ years coding internal tools, 5+ years cloud (GCP/AWS), Bachelor's in CS.

218k – 260kMountain View, CADevOps / SREOn-site10+ YOEGCPAWS

Staff Infrastructure Software Engineer, Enterprise AI

Builds and scales multi-cloud infrastructure for enterprise AI Agentic workflows, focusing on security, compliance, observability, and developer tools. Requires 5+ years experience with modern infra practices, cloud providers, and languages like Python.

216k – 270kNew York, NY +1DevOps / SREHybrid5+ YOEAWSGCP

Member of Technical Staff

This role is for a Software Engineer on the Cloud Infrastructure team, focusing on designing, building, and operating foundational cloud primitives and deployment models. The engineer will own the roadmap and technical strategy for agent-driven cloud infrastructure management, ensuring secure and scalable solutions for various customer environments.

220k – 405kSan Francisco, CA +2DevOps / SREOn-site7+ YOEGoAWS