Senior Site Reliability Engineer

123k – 144kUnited StatesRemote7+ YOEJun 8

Summary

Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.

About the role

What you’ll do

Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives.
Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows.
Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts.
Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design.
Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation.
Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems.
Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding.
Contribute to runbooks, architecture documentation, and team processes.

What you bring

7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management.
Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi.
Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls.
Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early.
Excellent async written communication skills; comfortable working with a geographically distributed team.
Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency.
Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

Bonus points for

Experience with GitOps workflows (ArgoCD or Flux).
Familiarity with Keycloak or similar identity platforms (OIDC, SAML, federation).
Knowledge of email protocols and/or experience operating email infrastructure (SMTP, IMAP).
Prior work in or alongside an open-source community.
French, German, Japanese, or other language proficiency in addition to English.

Compensation & benefits

We benchmark our base salaries to local markets and target the 60th percentile of the peer market. The salary ranges for this role are:

US: $123,000 - $144,000 USD
Canada: $108,000 - 125,000 CAD
UK: £62,000 - £72,000 GBP

We may consider candidates with strong skills but less than the required experience. Title, level and compensation will be determined based on qualifications and experience.

In addition to competitive salaries, we offer a comprehensive benefits package designed to support your whole self.

Work & career

Fully remote work & schedule flexibility
Company-provided laptop
Annual bonus program
Monthly remote work stipend
Annual professional development stipend
Industry conferences
Company all-hands and team gatherings

Rest & play

24 days PTO per year (prorated)
Your birthday
Year-end company shutdown
9 wellbeing days
Public holidays
Other paid leave
Quarterly wellbeing stipend for personal / family activities

Health & family

401(k) / RRSP contributions
Health, dental, & vision insurance
Disability insurance
Life insurance
Employee assistance program
Paid parental leave
Paid sick days

Skills

KubernetesAWSTerraformPulumiOpenTofuCI/CDGitHub ActionsObservabilityGrafanaVictoriaMetricsIAMAWS Secrets ManagerGitOpsArgoCDFlux

Similar roles at this salary range

All DevOps / SRE jobs →

Ai2

Jun 8

Senior Software Engineer, AI Infrastructure

Senior engineer building and operating large-scale HPC infrastructure for AI model training. Owns job scheduling, automation, and performance optimization across GPU clusters.

126k – 189kSeattle, WADevOps / SREOn-siteGoSRE

Mozilla

Jun 8

Senior Site Reliability Engineer

Senior SRE to operate and evolve an EKS-based Kubernetes platform, CI/CD pipelines, and observability stack on AWS. Requires 7+ years infrastructure/SRE experience with production Kubernetes and IaC fluency.

123k – 144kUnited StatesDevOps / SRERemoteEKSAWS

Clickhouse

Jun 4

Senior Cloud Engineer

Design, develop, and secure ClickHouse Cloud platforms for regulated and mission-critical environments across cloud, hybrid, and on-prem deployments. Requires 6+ years building scalable distributed systems, Kubernetes expertise, and proficiency in Go or Python.

141k – 230kUnited StatesDevOps / SRERemoteGoAWS

LiveKit

Jun 3

Distributed Systems Engineer

As a Senior/Staff Distributed Systems Engineer, you will design and evolve core control, data, and observability systems for LiveKit's platform, focusing on latency, availability, and operational simplicity. You will implement resilient architectures and build tools to enhance reliability and developer velocity.

120k – 250kUnited StatesDevOps / SRERemoteGogRPC

VGS

Jun 2

Sr. Infrastructure Engineer

As a Senior Infrastructure Engineer, you will be responsible for architecting and maintaining scalable, reliable cloud infrastructure, leading incident management, and improving operational processes. This role requires strong proficiency in AWS, infrastructure-as-code, and experience with monitoring and observability tools.

145k – 185kUnited StatesDevOps / SRERemoteGoAWS

Apply