Site Reliability Engineer

150k – 170kBoston, MANew York, NYDevOps / SRERemote5+ YOEApr 17

Summary

Builds and maintains cloud infrastructure reliability for large-scale ML on biosignal data, managing Kubernetes clusters, CI/CD pipelines, observability, and security. Requires 5+ years SRE/DevOps experience, Kubernetes expertise, IaC with Terraform, and cloud proficiency.

About the role

Responsibilities

Design and implement infrastructure as code solutions that improve reliability, security, and maintainability of our cloud infrastructure
Lead and execute major infrastructure initiatives including cluster upgrades, security improvements, and architectural changes
Develop and maintain CI/CD pipelines that enable teams to deploy safely and efficiently
Improve observability across our systems through enhanced monitoring, logging, and alerting
Participate in an on-call rotation and lead incident response efforts when issues arise
Collaborate with development teams to improve application reliability and performance
Maintain and enhance our security posture through infrastructure hardening and automation
Create and maintain documentation for infrastructure, deployment processes, and incident response procedures

Requirements

Strong experience with Kubernetes administration, including cluster management, security, and troubleshooting
Proven track record implementing infrastructure as code using Terraform or similar tools
Experience building and maintaining CI/CD pipelines, particularly with GitHub Actions, Azure DevOps, or ArgoCD
Solid understanding of container technologies and build processes, especially Docker
Strong cloud provider (e.g. AWS) knowledge including networking, security, and infrastructure services; experience with Azure is a plus
Experience with incident response and on-call responsibilities in a production environment
Deep experience with Linux systems administration and debugging; familiarity with Windows Server environments is a plus
Proficiency in at least one programming language (Python, Go, TypeScript etc.)
Understanding of security and networking concepts including OAuth2/OIDC, DNS, TLS, TCP/UDP, etc

Approximate experience: Bachelor's degree + 5-8 years of experience in SRE, DevOps, or similar

Compensation

Salary range: $150,000 – $170,000 (adjusted based on experience, skills, and location). Includes equity, PTO and other benefits.

Skills

KubernetesTerraformCI/CDGitHub ActionsDockerAWSAzureLinuxPythonGo

Similar roles at this salary range

All DevOps / SRE jobs →

Northwood Space

Jun 19

Senior Network Engineer

Design, deploy, and operate enterprise network infrastructure for corporate facilities and hybrid cloud environments with zero-trust architecture and compliance requirements. Requires 5+ years enterprise networking experience and ability to obtain TS/SCI clearance.

133k – 215kLos Angeles, CA +1DevOps / SREOn-site5+ YOEAWSVLAN

Fivetran

Jun 18

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP

Forterra

Jun 18

Senior Software Engineer-Internal Tools

Senior Software Engineer on the DevOps and Tooling team building internal tools. Requires 3-5+ years experience, Rust or strong systems background, TypeScript/React, Linux, Docker, and CI/CD.

125k – 140kArlington, VA +1DevOps / SREOn-site5+ YOEAWSRust

Beacon AI

Jun 17

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud infrastructure and LLM platform services including RAG pipelines, vector search, model endpoints, and data ingestion for an aviation AI company.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOEAWSGlue

MongoDB

Jun 17

Site Reliability Engineer

Senior or Staff Site Reliability Engineer focused on continuous delivery infrastructure using Argo Workflows, ArgoCD, and Kubernetes. Owns deployment tooling, onboarding flows, and participates in 24/7 on-call. Requires 6+ years building and operating distributed systems.

127k – 249kBoston, MA +6DevOps / SREHybrid6+ YOEGoAWS

Apply