Skip to content

Senior Site Reliability Engineer

123k – 144kUnited StatesRemote7+ YOE
Summary

Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.

About the role

What you’ll do

  • Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives.
  • Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows.
  • Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts.
  • Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design.
  • Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation.
  • Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems.
  • Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding.
  • Contribute to runbooks, architecture documentation, and team processes.

What you bring

  • 7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management.
  • Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi.
  • Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls.
  • Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early.
  • Excellent async written communication skills; comfortable working with a geographically distributed team.
  • Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency.
  • Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

Bonus points for

  • Experience with GitOps workflows (ArgoCD or Flux).
  • Familiarity with Keycloak or similar identity platforms (OIDC, SAML, federation).
  • Knowledge of email protocols and/or experience operating email infrastructure (SMTP, IMAP).
  • Prior work in or alongside an open-source community.
  • French, German, Japanese, or other language proficiency in addition to English.

Compensation & benefits

We benchmark our base salaries to local markets and target the 60th percentile of the peer market. The salary ranges for this role are:

  • US: $123,000 - $144,000 USD
  • Canada: $108,000 - 125,000 CAD
  • UK: £62,000 - £72,000 GBP

We may consider candidates with strong skills but less than the required experience. Title, level and compensation will be determined based on qualifications and experience.

In addition to competitive salaries, we offer a comprehensive benefits package designed to support your whole self.

Work & career

  • Fully remote work & schedule flexibility
  • Company-provided laptop
  • Annual bonus program
  • Monthly remote work stipend
  • Annual professional development stipend
  • Industry conferences
  • Company all-hands and team gatherings

Rest & play

  • 24 days PTO per year (prorated)
  • Your birthday
  • Year-end company shutdown
  • 9 wellbeing days
  • Public holidays
  • Other paid leave
  • Quarterly wellbeing stipend for personal / family activities

Health & family

  • 401(k) / RRSP contributions
  • Health, dental, & vision insurance
  • Disability insurance
  • Life insurance
  • Employee assistance program
  • Paid parental leave
  • Paid sick days
Skills
KubernetesAWSTerraformPulumiOpenTofuCI/CDGitHub ActionsObservabilityGrafanaVictoriaMetricsIAMAWS Secrets ManagerGitOpsArgoCDFlux
Similar roles at this salary range
All DevOps / SRE jobs →
Ai2

Senior Software Engineer, AI Infrastructure

Senior engineer building and operating large-scale HPC infrastructure for AI model training. Owns job scheduling, automation, and performance optimization across GPU clusters.

126k – 189kSeattle, WADevOps / SREOn-siteGoSRE
Mozilla

Senior Site Reliability Engineer

Senior SRE to operate and evolve an EKS-based Kubernetes platform, CI/CD pipelines, and observability stack on AWS. Requires 7+ years infrastructure/SRE experience with production Kubernetes and IaC fluency.

123k – 144kUnited StatesDevOps / SRERemoteEKSAWS
Clickhouse

Senior Cloud Engineer

Design, develop, and secure ClickHouse Cloud platforms for regulated and mission-critical environments across cloud, hybrid, and on-prem deployments. Requires 6+ years building scalable distributed systems, Kubernetes expertise, and proficiency in Go or Python.

141k – 230kUnited StatesDevOps / SRERemoteGoAWS
LiveKit

Distributed Systems Engineer

As a Senior/Staff Distributed Systems Engineer, you will design and evolve core control, data, and observability systems for LiveKit's platform, focusing on latency, availability, and operational simplicity. You will implement resilient architectures and build tools to enhance reliability and developer velocity.

120k – 250kUnited StatesDevOps / SRERemoteGogRPC
VGS

Sr. Infrastructure Engineer

As a Senior Infrastructure Engineer, you will be responsible for architecting and maintaining scalable, reliable cloud infrastructure, leading incident management, and improving operational processes. This role requires strong proficiency in AWS, infrastructure-as-code, and experience with monitoring and observability tools.

145k – 185kUnited StatesDevOps / SRERemoteGoAWS