Senior Site Reliability Engineer

175k – 210kOakland, CAHybrid5+ YOEJun 18

Summary

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

About the role

What You’ll Do

Responsible for ongoing reliability and robustness of Fivetran’s production infrastructure by monitoring availability, capacity, and throughput.
Evolve systems by adding reliability into our product roadmap.
Coordinate the re-prioritize or fix critical bugs for support or sales requirements as needed.
Make recommendations to production infrastructure by interfacing with engineering to ensure 100% availability.
Ensure scalable artifacts deployment to all environments by automation scripts.
Constantly monitor infrastructure vulnerabilities and remedy them by working with the security team.

Technologies You’ll Use

Kubernetes, PostgreSQL, ArgoCD, Terraform, Ansible, Python, Go, Java, AWS, GCP, Azure, Grafana, Buildkite, Temporal.

Skills We’re Looking For

5+ years of experience working with SaaS products at scale.
Working knowledge of managed Kubernetes (EKS, AKS and GKE).
Knowledge of Cloud Platforms and related tooling: AWS, Azure, GCP, Terraform, Ansible, Buildkite, Pulumi and ArgoCD.
Experience in Python/Shell scripting. Bonus if you have Java, Go, etc.
Experience with Linux operating systems internals and administration.
Experience with cloud networking like VPNs, PrivateLinks, and Private Service Connect (GCP).
Experience with databases such as PostgreSQL.

Optional Bonus Skills

Java, GoLang Programming skills.

Skills

KubernetesPostgreSQLTerraformAnsiblePythonAWSGCPAzureArgoCDLinux

Similar roles at this salary range

All DevOps / SRE jobs →

Dropbox

Jun 18

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++

Okta

Jun 17

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE

Cribl

Jun 17

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3

Grow Therapy

Jun 16

Senior Platform Reliability Engineer

Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.

182k – 250kSan Francisco, CA +2DevOps / SREHybrid6+ YOEAWSEKS

WHOOP

Jun 16

Senior Platform Engineer - Kubernetes

Senior Platform Engineer responsible for designing, operating, and scaling Kubernetes clusters on AWS. Focuses on CI/CD, infrastructure automation, and developer productivity across WHOOP's technology stacks.

150k – 215kBoston, MADevOps / SREHybrid5+ YOEC#AWS

Apply