Skip to content

Senior SRE

Senior SRE owns production system reliability, designs monitoring/alerting, builds automation tooling, and ensures operational excellence in a regulated fintech environment. Requires 4+ years SRE experience, deep AWS expertise, on-call at scale, and Nix.

San Francisco, CADevOps / SREOnsite4+ YOE

About the role

Responsibilities

  • Own reliability and operational excellence for production systems
  • Design and implement monitoring, alerting, and incident response processes
  • Build tooling to improve engineering team effectiveness
  • Establish on-call rotations and runbooks
  • Ensure platform handles demands of regulated financial product
  • Spend 50%+ time writing code: infrastructure tooling, automation, reliability improvements, developer productivity tools

Requirements (Must-haves)

  • 4+ years experience in SRE, infrastructure, or platform engineering
  • Experience on a team of SREs at company with mature SRE practices
  • Real on-call experience at scale in large production environment
  • Deep AWS expertise (ECS, RDS, networking, security)
  • Strong experience with declarative infrastructure (Terraform, CDK, or similar)
  • Nix experience
  • Track record of building reliability tooling and automation
  • Can design and implement monitoring, alerting, and observability systems from first principles
  • Comfortable in regulated environment

Nice-to-haves

  • Experience at companies with strong SRE cultures (Google, Replit, Stripe, etc.)
  • Background in fintech, healthtech, or regulated domains
  • Experience migrating monitoring systems or implementing SLOs
  • Contributions to infrastructure tooling or open source projects

Technology Stack

Infrastructure: AWS (ECS, RDS, CloudFront, Lambda), CDK
Observability: Honeycomb, OpenTelemetry
CI/CD: GitHub Actions, Nix
Core platform: TypeScript/Node, PostgreSQL, React
Languages: TypeScript, Python, Nix, SQL

Compensation & Benefits

  • Stock options
  • Health insurance, 401K, dental

Skills

AWSECSRdsTerraformCdkNixHoneycombOpenTelemetryTypeScriptPostgresNode.jsPythonGitHub ActionsKubernetesSLOs

Similar roles

DevOps / SRE jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer building and operating highly reliable, scalable Kubernetes-based cloud services in Okta's Emerging Products Group. Lead incident response, define SLOs, develop automation in Go/Python/Terraform, improve observability, and mentor on reliability best practices.

San Francisco, CADevOps / SREHybrid5+ YOEGoAWS

Senior Software Engineer, Infrastructure

Senior engineer building and standardizing AWS/GCP cloud infrastructure, networking, and self-service tooling for Coinbase's multi-cloud platform.

186k – 219kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Senior Software Engineer - Snowpark Container Service

Senior engineer to design, build, and lead development of Snowpark Container Services, a Kubernetes-based container compute platform. Requires 7+ years building large-scale distributed systems and strong coding skills in Java, C++, or Go.

200k – 288kBellevue, WADevOps / SREHybrid7+ YOEGoC++

Senior DevOps Engineer

Senior DevOps Engineer building and operating Kubernetes-based ephemeral environments and cloud infrastructure on AWS to improve developer productivity and platform reliability.

153k – 231kUnited StatesDevOps / SRERemote4+ YOEGoAWS

Senior Site Reliability Engineer - Government Cloud

Build and operate AWS GovCloud infrastructure for federal customers, owning IaC, container pipelines, compliance documentation, and operational tooling. Requires 5+ years AWS experience and FedRAMP familiarity.

210k – 220kUnited StatesDevOps / SRERemote5+ YOEAWSCdk