Senior Site Reliability Engineer
Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.
What you’ll do
- Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives.
- Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows.
- Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts.
- Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design.
- Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation.
- Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems.
- Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding.
- Contribute to runbooks, architecture documentation, and team processes.
What you bring
- 7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management.
- Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi.
- Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls.
- Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early.
- Excellent async written communication skills; comfortable working with a geographically distributed team.
- Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency.
- Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.
Bonus points for
- Experience with GitOps workflows (ArgoCD or Flux).
- Familiarity with Keycloak or similar identity platforms (OIDC, SAML, federation).
- Knowledge of email protocols and/or experience operating email infrastructure (SMTP, IMAP).
- Prior work in or alongside an open-source community.
- French, German, Japanese, or other language proficiency in addition to English.
Compensation & benefits
We benchmark our base salaries to local markets and target the 60th percentile of the peer market. The salary ranges for this role are:
- US: $123,000 - $144,000 USD
- Canada: $108,000 - 125,000 CAD
- UK: £62,000 - £72,000 GBP
We may consider candidates with strong skills but less than the required experience. Title, level and compensation will be determined based on qualifications and experience.
In addition to competitive salaries, we offer a comprehensive benefits package designed to support your whole self.
Work & career
- Fully remote work & schedule flexibility
- Company-provided laptop
- Annual bonus program
- Monthly remote work stipend
- Annual professional development stipend
- Industry conferences
- Company all-hands and team gatherings
Rest & play
- 24 days PTO per year (prorated)
- Your birthday
- Year-end company shutdown
- 9 wellbeing days
- Public holidays
- Other paid leave
- Quarterly wellbeing stipend for personal / family activities
Health & family
- 401(k) / RRSP contributions
- Health, dental, & vision insurance
- Disability insurance
- Life insurance
- Employee assistance program
- Paid parental leave
- Paid sick days
Senior Cloud Engineer
Design, develop, and secure ClickHouse Cloud platforms for regulated and mission-critical environments across cloud, hybrid, and on-prem deployments. Requires 6+ years building scalable distributed systems, Kubernetes expertise, and proficiency in Go or Python.
Distributed Systems Engineer
As a Senior/Staff Distributed Systems Engineer, you will design and evolve core control, data, and observability systems for LiveKit's platform, focusing on latency, availability, and operational simplicity. You will implement resilient architectures and build tools to enhance reliability and developer velocity.
Sr. Infrastructure Engineer
As a Senior Infrastructure Engineer, you will be responsible for architecting and maintaining scalable, reliable cloud infrastructure, leading incident management, and improving operational processes. This role requires strong proficiency in AWS, infrastructure-as-code, and experience with monitoring and observability tools.