Staff Site Reliability Engineer, Release Engineering
208k – 274kNew York, NYDevOps / SREHybrid8+ YOE
Summary
Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.
About the role
What excites you
- Lead the expansion of reliability standards across product engineering, converting foundational infrastructure into lasting operational habits and tooling.
- Architect and manage the SLO and error-budget framework, empowering teams to utilize reliability data for strategic product and release choices.
- Promote widespread use of progressive delivery and automated safety gates, ensuring high velocity without compromising production stability.
- Guide emerging product teams toward production readiness through expertise in observability, incident response, and scalable deployment health.
- Collaborate with SRE, Platform, and Infrastructure teams to transform complex production requirements into intuitive, self-service platform features.
- Direct the response to critical incidents and ensure the resulting post-mortem actions yield permanent improvements to the platform.
- Prepare for an AI-driven development landscape by scaling our safety nets to handle an increased volume and frequency of code changes.
What excites us
- Over 8 years of professional experience in backend systems, SRE, or platform engineering roles.
- Proven track record of designing reliability programs—such as service maturity models or SLI frameworks—that achieved cross-team adoption.
- Direct experience building or operating canary rollout systems, metric-gated analysis, or automated rollback infrastructure.
- Technical proficiency in software development, with a preference for Go or similar systems languages.
- Ability to drive organizational change and influence engineering culture without formal authority.
- Sound technical judgment in high-stakes production scenarios, balancing user impact with developer velocity.
- Prior exposure to Kubernetes, service mesh technologies, Prometheus, or ArgoCD is considered a strong asset.
Compensation and Benefits
- Additional compensation in the form(s) of equity and/or commission are dependent on the position offered.
- Plaid provides a comprehensive benefit plan, including medical, dental, vision, and 401(k).
Skills
GoKubernetesPrometheusArgoCDservice meshSLOerror budgetscanary rolloutsobservabilityincident response
Similar roles at this salary range
All DevOps / SRE jobs →Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Senior Platform Reliability Engineer
Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.
182k – 250kSan Francisco, CA +2DevOps / SREHybrid6+ YOEAWSEKS