Skip to content

Staff Site Reliability Engineer, Release Engineering

208k – 274kNew York, NYDevOps / SREHybrid8+ YOE
Summary

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

About the role

What excites you

  • Lead the expansion of reliability standards across product engineering, converting foundational infrastructure into lasting operational habits and tooling.
  • Architect and manage the SLO and error-budget framework, empowering teams to utilize reliability data for strategic product and release choices.
  • Promote widespread use of progressive delivery and automated safety gates, ensuring high velocity without compromising production stability.
  • Guide emerging product teams toward production readiness through expertise in observability, incident response, and scalable deployment health.
  • Collaborate with SRE, Platform, and Infrastructure teams to transform complex production requirements into intuitive, self-service platform features.
  • Direct the response to critical incidents and ensure the resulting post-mortem actions yield permanent improvements to the platform.
  • Prepare for an AI-driven development landscape by scaling our safety nets to handle an increased volume and frequency of code changes.

What excites us

  • Over 8 years of professional experience in backend systems, SRE, or platform engineering roles.
  • Proven track record of designing reliability programs—such as service maturity models or SLI frameworks—that achieved cross-team adoption.
  • Direct experience building or operating canary rollout systems, metric-gated analysis, or automated rollback infrastructure.
  • Technical proficiency in software development, with a preference for Go or similar systems languages.
  • Ability to drive organizational change and influence engineering culture without formal authority.
  • Sound technical judgment in high-stakes production scenarios, balancing user impact with developer velocity.
  • Prior exposure to Kubernetes, service mesh technologies, Prometheus, or ArgoCD is considered a strong asset.

Compensation and Benefits

  • Additional compensation in the form(s) of equity and/or commission are dependent on the position offered.
  • Plaid provides a comprehensive benefit plan, including medical, dental, vision, and 401(k).
Skills
GoKubernetesPrometheusArgoCDservice meshSLOerror budgetscanary rolloutsobservabilityincident response
Similar roles at this salary range
All DevOps / SRE jobs →
Fivetran

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP
Dropbox

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++
Okta

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Cribl

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3
Grow Therapy

Senior Platform Reliability Engineer

Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.

182k – 250kSan Francisco, CA +2DevOps / SREHybrid6+ YOEAWSEKS