# Senior Site Reliability Engineer, Core AI Infrastructure
**Company:** [Coinbase](https://hotfix.jobs/companies/coinbase)
**Location:** Remote
**Salary:** $186K-$219K
**Experience:** 5+ years
**Skills:** AWS, Terraform, Ansible, Chef, Puppet, Salt, Docker, Kubernetes, Python, Bash, Ruby, Go, Git, CI/CD
**Posted:** 2026-06-08
> Senior SRE owning reliability, monitoring, and automation for Coinbase's AI infrastructure on AWS and Kubernetes. Requires 5+ years cloud automation experience and strong incident response skills.
## Job Description
## Responsibilities
- Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
- Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
- Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
- Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation that sets a standard of excellence.
- Develop full-stack applications that power internal AI products and infrastructure with Go or Python.

## Requirements
- 5+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
- Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production environments.
- Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and version control workflows using Git-based CI/CD pipelines.
- Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.
- Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.

## Nice to Haves
- Expertise with linux, bash, ruby, python and/or go
- Expertise automating EC2 or containers deployment with terraform
- Strong network security fundamentals
- Experience managing and leveraging log aggregation
- Experience working in a highly regulated environment
- Experience in a fast-paced, high-growth company
- Experience in a Remote-first IT environment
**Apply:** https://hotfix.jobs/jobs/senior-site-reliability-engineer-core-ai-infrastructure-at-coinbase-c520a948-2e6c-4143-adca-8fe07eaf298e
**Canonical:** https://hotfix.jobs/jobs/senior-site-reliability-engineer-core-ai-infrastructure-at-coinbase-c520a948-2e6c-4143-adca-8fe07eaf298e