Skip to content

Sr/Staff Site Reliability Engineer

Senior/Staff SRE responsible for building and maintaining cloud-native infrastructure on GCP using Terraform, Kubernetes, Istio, and observability tools to ensure reliability and scalability of fintech systems.

Chicago, ILRedwood City, CADevOps / SREHybrid6+ YOE

About the role

Responsibilities

  • Write Terraform modules for deploying infrastructure resources via GitLab pipelines
  • Develop Helm charts for deploying services and jobs in Kubernetes clusters
  • Define metrics, network policies, and routing rules for Istio service mesh
  • Monitor and maintain GCP BigQuery and Spanner databases
  • Pipe metrics to Google-managed Prometheus and build Grafana dashboards and alerts
  • Experiment with GCP offerings, third-party vendors, and open-source tools to automate and secure operations
  • Leverage LLM models in developing infrastructure and tooling
  • Pair with engineering leads to instrument and monitor critical functionality
  • Add automation to existing and new systems to reduce reliance on manual processes
  • Participate in architecture design and capacity planning discussions
  • Build, maintain, and improve CI/CD pipelines

Requirements

  • 6+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP)
  • Experience with Docker, Kubernetes, and Istio or similar service mesh
  • Experience with SQL databases such as MySQL, Google BigQuery, and Google Spanner
  • Experience with streaming technologies such as Kafka and Amazon Kinesis
  • Experience with pub/sub technologies such as AWS SNS and Google Pub/Sub
  • Experience with serverless technologies such as AWS Lambda and Google Cloud Functions/Cloud Run
  • Experience with Terraform
  • Experience with observability tools such as Datadog, Prometheus, and Grafana
  • Strong computer science and software engineering fundamentals
  • Experience with SOC2 compliance processes

Nice-to-Haves

  • Comfortable wearing many hats in a fast-paced environment
  • Willingness to learn, teach, and provide/receive feedback
  • Desire to automate processes and tinker with new technologies

Skills

TerraformKubernetesDockerIstioGCPBigQuerySpannerPrometheusGrafanaKafka

Similar roles

DevOps / SRE jobs

Staff Software Engineer, Cloud FinOps

Staff-level engineer driving company-wide cloud cost optimization and FinOps initiatives across engineering teams. Requires 5+ years infrastructure experience and 2+ years FinOps/cloud cost management.

180k – 240kUnited StatesDevOps / SRERemote5+ YOEAWSJava

Staff Software Engineer, Core Reliability

Staff engineer on the Infra Reliability team improving system resiliency, deployment safety, and configuration management for Coinbase's production environment at massive scale.

218k – 257kUnited StatesDevOps / SRERemote7+ YOEGoAWS

Staff+ Software Engineer, Caching

Build and operate Anthropic's managed Redis caching layer and client libraries from the ground up. Drive technical direction for distributed caching infrastructure across multi-cloud environments with focus on consistency, performance, and developer experience.

320k – 485kSan Francisco, CA +2DevOps / SREHybrid10+ YOEGoC++

Senior Staff Engineer, Platform R&D

Senior individual contributor embedded in Crusoe's Managed Platform Services team to accelerate delivery through rapid AI-augmented R&D, prototyping, and cross-domain technical leadership. Requires 10+ years experience with systems languages and cloud-native infrastructure.

245k – 295kSan Francisco, CADevOps / SREOn-site10+ YOEGoC++

Software Engineer, Developer Experience

Lead the rollout of Go as a fully supported, production-grade platform at Notion. Own service patterns, tooling, and guardrails while tackling high-leverage developer experience challenges across AI workflows, CI, and reliability.

New York, NY +1DevOps / SREHybrid10+ YOEGoCI/CD