# Staff Site Reliability Engineer
**Company:** [Coinbase](https://hotfix.jobs/companies/coinbase)
**Location:** Remote
**Salary:** $218K-$257K
**Experience:** 8+ years
**Skills:** AWS, Terraform, Ansible, Chef, Puppet, Salt, Docker, Kubernetes, Python, Go, Bash, Ruby, Git, CI/CD
**Posted:** 2026-06-08
> Staff SRE on the IT Operations team owning reliability, automation, and observability for Coinbase's AI infrastructure on AWS and Kubernetes. Requires 8+ years of cloud infrastructure experience and strong incident response leadership.
## Job Description
## Responsibilities
- Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
- Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
- Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
- Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation that sets a standard of excellence.
- Develop full-stack applications that power internal AI products and infrastructure with Go or Python.

## Requirements
- 8+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
- Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production environments.
- Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and version control workflows using Git-based CI/CD pipelines.
- Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.
- Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.

## Nice to Haves
- Expertise with linux, bash, ruby, python and/or go
- Expertise automating EC2 or containers deployment with terraform
- Strong network security fundamentals
- Experience managing and leveraging log aggregation
- Experience working in a highly regulated environment
- Experience in a fast-paced, high-growth company
- Experience in a Remote-first IT environment
**Apply:** https://hotfix.jobs/jobs/staff-site-reliability-engineer-at-coinbase-0c78f444-e426-40c8-bdf8-d569c75598ce
**Canonical:** https://hotfix.jobs/jobs/staff-site-reliability-engineer-at-coinbase-0c78f444-e426-40c8-bdf8-d569c75598ce