# Sr/Staff Site Reliability Engineer

**Company:** [Attain](https://hotfix.jobs/companies/attain)
**Location:** Chicago, IL, Redwood City, CA
**Role:** DevOps / SRE
**Experience:** 6+ years
**Skills:** Terraform, Kubernetes, Docker, Istio, GCP, BigQuery, Spanner, Prometheus, Grafana, Kafka
**Posted:** 2026-01-15

> Senior/Staff SRE responsible for building and maintaining cloud-native infrastructure on GCP using Terraform, Kubernetes, Istio, and observability tools to ensure reliability and scalability of fintech systems.

## Job Description

## Responsibilities
- Write Terraform modules for deploying infrastructure resources via GitLab pipelines
- Develop Helm charts for deploying services and jobs in Kubernetes clusters
- Define metrics, network policies, and routing rules for Istio service mesh
- Monitor and maintain GCP BigQuery and Spanner databases
- Pipe metrics to Google-managed Prometheus and build Grafana dashboards and alerts
- Experiment with GCP offerings, third-party vendors, and open-source tools to automate and secure operations
- Leverage LLM models in developing infrastructure and tooling
- Pair with engineering leads to instrument and monitor critical functionality
- Add automation to existing and new systems to reduce reliance on manual processes
- Participate in architecture design and capacity planning discussions
- Build, maintain, and improve CI/CD pipelines

## Requirements
- 6+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP)
- Experience with Docker, Kubernetes, and Istio or similar service mesh
- Experience with SQL databases such as MySQL, Google BigQuery, and Google Spanner
- Experience with streaming technologies such as Kafka and Amazon Kinesis
- Experience with pub/sub technologies such as AWS SNS and Google Pub/Sub
- Experience with serverless technologies such as AWS Lambda and Google Cloud Functions/Cloud Run
- Experience with Terraform
- Experience with observability tools such as Datadog, Prometheus, and Grafana
- Strong computer science and software engineering fundamentals
- Experience with SOC2 compliance processes

## Nice-to-Haves
- Comfortable wearing many hats in a fast-paced environment
- Willingness to learn, teach, and provide/receive feedback
- Desire to automate processes and tinker with new technologies

## Similar roles

- [Staff Software Engineer, Cloud FinOps](https://hotfix.jobs/jobs/staff-software-engineer-cloud-finops-at-attentive-b5f0e60f-5309-4283-9ec8-bc7cc5267cf1) - Attentive - Remote - $180K-$240K
- [Staff Software Engineer, Core Reliability](https://hotfix.jobs/jobs/staff-software-engineer-core-reliability-at-coinbase-e23f6926-088d-4866-8936-1b35330e5252) - Coinbase - Remote - $218K-$257K
- [Staff+ Software Engineer, Caching](https://hotfix.jobs/jobs/staff-software-engineer-caching-at-anthropic-cc55152a-5ae1-4552-a117-f029dc232ea0) - Anthropic - San Francisco, CA - $320K-$485K
- [Senior Staff Engineer, Platform R&D](https://hotfix.jobs/jobs/senior-staff-engineer-platform-r-d-at-crusoe-4deec7c0-2c17-40c3-b3e5-d26966fc2b92) - Crusoe - San Francisco, CA - $245K-$295K
- [Software Engineer, Developer Experience](https://hotfix.jobs/jobs/software-engineer-developer-experience-at-notion-4bf3c31e-57a5-413a-9195-84b2f740045e) - Notion - New York, NY

**Apply:** https://hotfix.jobs/jobs/sr-staff-site-reliability-engineer-at-attain-c3f21a66-f30f-4791-970d-f6bfb7a4f485
**Canonical:** https://hotfix.jobs/jobs/sr-staff-site-reliability-engineer-at-attain-c3f21a66-f30f-4791-970d-f6bfb7a4f485