Senior Site Reliability Engineer
Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.
What You’ll Do
- Responsible for ongoing reliability and robustness of Fivetran’s production infrastructure by monitoring availability, capacity, and throughput.
- Evolve systems by adding reliability into our product roadmap.
- Coordinate the re-prioritize or fix critical bugs for support or sales requirements as needed.
- Make recommendations to production infrastructure by interfacing with engineering to ensure 100% availability.
- Ensure scalable artifacts deployment to all environments by automation scripts.
- Constantly monitor infrastructure vulnerabilities and remedy them by working with the security team.
Technologies You’ll Use
Kubernetes, PostgreSQL, ArgoCD, Terraform, Ansible, Python, Go, Java, AWS, GCP, Azure, Grafana, Buildkite, Temporal.
Skills We’re Looking For
- 5+ years of experience working with SaaS products at scale.
- Working knowledge of managed Kubernetes (EKS, AKS and GKE).
- Knowledge of Cloud Platforms and related tooling: AWS, Azure, GCP, Terraform, Ansible, Buildkite, Pulumi and ArgoCD.
- Experience in Python/Shell scripting. Bonus if you have Java, Go, etc.
- Experience with Linux operating systems internals and administration.
- Experience with cloud networking like VPNs, PrivateLinks, and Private Service Connect (GCP).
- Experience with databases such as PostgreSQL.
Optional Bonus Skills
- Java, GoLang Programming skills.
Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
Senior Platform Reliability Engineer
Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.