Senior SRE

Senior SRE owns production system reliability, designs monitoring/alerting, builds automation tooling, and ensures operational excellence in a regulated fintech environment. Requires 4+ years SRE experience, deep AWS expertise, on-call at scale, and Nix.

San Francisco, CADevOps / SREOnsite4+ YOE

Apply

About the role

Responsibilities

Own reliability and operational excellence for production systems
Design and implement monitoring, alerting, and incident response processes
Build tooling to improve engineering team effectiveness
Establish on-call rotations and runbooks
Ensure platform handles demands of regulated financial product
Spend 50%+ time writing code: infrastructure tooling, automation, reliability improvements, developer productivity tools

Requirements (Must-haves)

4+ years experience in SRE, infrastructure, or platform engineering
Experience on a team of SREs at company with mature SRE practices
Real on-call experience at scale in large production environment
Deep AWS expertise (ECS, RDS, networking, security)
Strong experience with declarative infrastructure (Terraform, CDK, or similar)
Nix experience
Track record of building reliability tooling and automation
Can design and implement monitoring, alerting, and observability systems from first principles
Comfortable in regulated environment

Nice-to-haves

Experience at companies with strong SRE cultures (Google, Replit, Stripe, etc.)
Background in fintech, healthtech, or regulated domains
Experience migrating monitoring systems or implementing SLOs
Contributions to infrastructure tooling or open source projects

Technology Stack

Infrastructure: AWS (ECS, RDS, CloudFront, Lambda), CDK
Observability: Honeycomb, OpenTelemetry
CI/CD: GitHub Actions, Nix
Core platform: TypeScript/Node, PostgreSQL, React
Languages: TypeScript, Python, Nix, SQL

Compensation & Benefits

Stock options
Health insurance, 401K, dental

Skills

AWSECSRdsTerraformCdkNixHoneycombOpenTelemetryTypeScriptPostgresNode.jsPythonGitHub ActionsKubernetesSLOs

Similar roles

DevOps / SRE jobs

Okta

Senior Site Reliability Engineer

Senior Site Reliability Engineer building and operating highly reliable, scalable Kubernetes-based cloud services in Okta's Emerging Products Group. Lead incident response, define SLOs, develop automation in Go/Python/Terraform, improve observability, and mentor on reliability best practices.

San Francisco, CADevOps / SREHybrid5+ YOEGoAWS

Coinbase

Senior Software Engineer, Infrastructure

Senior engineer building and standardizing AWS/GCP cloud infrastructure, networking, and self-service tooling for Coinbase's multi-cloud platform.

186k – 219kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Snowflake

Senior Software Engineer - Snowpark Container Service

Senior engineer to design, build, and lead development of Snowpark Container Services, a Kubernetes-based container compute platform. Requires 7+ years building large-scale distributed systems and strong coding skills in Java, C++, or Go.

200k – 288kBellevue, WADevOps / SREHybrid7+ YOEGoC++

Upstart

Senior DevOps Engineer

Senior DevOps Engineer building and operating Kubernetes-based ephemeral environments and cloud infrastructure on AWS to improve developer productivity and platform reliability.

153k – 231kUnited StatesDevOps / SRERemote4+ YOEGoAWS

Tines

Senior Site Reliability Engineer - Government Cloud

Build and operate AWS GovCloud infrastructure for federal customers, owning IaC, container pipelines, compliance documentation, and operational tooling. Requires 5+ years AWS experience and FedRAMP familiarity.

210k – 220kUnited StatesDevOps / SRERemote5+ YOEAWSCdk