Staff Site Reliability Engineer, Release Engineering

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEJun 19

Summary

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

About the role

What excites you

Lead the expansion of reliability standards across product engineering, converting foundational infrastructure into lasting operational habits and tooling.
Architect and manage the SLO and error-budget framework, empowering teams to utilize reliability data for strategic product and release choices.
Promote widespread use of progressive delivery and automated safety gates, ensuring high velocity without compromising production stability.
Guide emerging product teams toward production readiness through expertise in observability, incident response, and scalable deployment health.
Collaborate with SRE, Platform, and Infrastructure teams to transform complex production requirements into intuitive, self-service platform features.
Direct the response to critical incidents and ensure the resulting post-mortem actions yield permanent improvements to the platform.
Prepare for an AI-driven development landscape by scaling our safety nets to handle an increased volume and frequency of code changes.

What excites us

Over 8 years of professional experience in backend systems, SRE, or platform engineering roles.
Proven track record of designing reliability programs—such as service maturity models or SLI frameworks—that achieved cross-team adoption.
Direct experience building or operating canary rollout systems, metric-gated analysis, or automated rollback infrastructure.
Technical proficiency in software development, with a preference for Go or similar systems languages.
Ability to drive organizational change and influence engineering culture without formal authority.
Sound technical judgment in high-stakes production scenarios, balancing user impact with developer velocity.
Prior exposure to Kubernetes, service mesh technologies, Prometheus, or ArgoCD is considered a strong asset.

Compensation and Benefits

Additional compensation in the form(s) of equity and/or commission are dependent on the position offered.
Plaid provides a comprehensive benefit plan, including medical, dental, vision, and 401(k).

Skills

GoKubernetesPrometheusArgoCDservice meshSLOerror budgetscanary rolloutsobservabilityincident response

Similar roles at this salary range

All DevOps / SRE jobs →

Fivetran

Jun 18

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP

Dropbox

Jun 18

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++

Okta

Jun 17

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE

Cribl

Jun 17

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3

Grow Therapy

Jun 16

Senior Platform Reliability Engineer

Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.

182k – 250kSan Francisco, CA +2DevOps / SREHybrid6+ YOEAWSEKS

Apply