Staff, Site Reliability Engineer (SRE)

160k – 255kSan Francisco, CAMenlo Park, CADevOps / SREHybrid8+ YOEMay 27

Summary

Staff SRE responsible for designing and improving infrastructure reliability, security, and observability for a hybrid healthcare delivery platform. Requires 8+ years of experience in SRE or related infrastructure roles with deep expertise in Terraform, AWS/GCP, and production systems.

About the role

Responsibilities

Design, build, and improve the infrastructure that powers patient care, clinician operations, internal tooling, and partner-facing systems
Improve reliability across distributed systems, cloud infrastructure, CI/CD, observability, and incident response
Raise the security baseline across cloud infrastructure, access controls, secrets management, identity, and operational workflows
Build and maintain infrastructure as code using Terraform and related tooling
Automate manual infrastructure and operational processes through scripting, tooling, and platform improvements
Partner with engineering teams to improve system architecture, deployment practices, monitoring, logging, and alerting
Troubleshoot complex issues across infrastructure, application, data, and operational boundaries
Help define reliability, security, and infrastructure standards that allow the company to scale without creating brittle systems
Support incident response practices, postmortems, operational readiness, and continuous improvement across engineering
Make pragmatic tradeoffs between reliability, security, speed, and simplicity in a fast-moving startup environment

Requirements

8+ years in site reliability engineering, platform engineering, infrastructure engineering, security engineering, or related technical roles
Led high-impact infrastructure, reliability, platform, or security projects end to end with minimal oversight
Built and operated production systems in cloud environments (AWS and/or GCP)
Worked deeply with infrastructure as code, ideally Terraform
Improved observability, monitoring, logging, alerting, and incident response practices across engineering teams
Automated infrastructure, deployment, or operational workflows using scripting languages such as Python, Bash, or TypeScript
Improved cloud security, access management, secrets management, networking, or operational controls
Troubleshot production issues across application, infrastructure, networking, and deployment layers
Worked in environments where reliability, security, ambiguity, and speed all matter
Made technical decisions that balanced immediate business needs with long-term scalability, reliability, and maintainability

Nice-to-Haves

Built or scaled infrastructure in health tech, logistics, marketplace, fintech, or other operationally complex environments
Worked in mid- or growth-stage startups where speed, ambiguity, and pragmatic decision-making were required
Experience improving security posture in a practical, engineering-friendly way
Helped establish reliability standards, incident response practices, or platform patterns across an engineering org
Comfortable working directly with product engineers, data teams, operations, security stakeholders, and technical leadership
Experience mentoring engineers and raising the operational bar across a broader engineering team
Worked in regulated environments and understand the importance of privacy, security, and compliance best practices
People management experience or interest in growing into broader technical leadership over time

Technology Stack

Terraform and infrastructure-as-code tooling
AWS, GCP
TypeScript, Python, Bash
CI/CD systems
Monitoring, logging, and observability platforms
Identity, access, and secrets management systems
Cloud networking and infrastructure tooling
Container and deployment systems
Serverless AWS (AppSync, DynamoDB, Lambda, Amplify, CloudFormation)
Node, GraphQL, React Native, React Native for Web

Compensation & Benefits

Meaningful pre-IPO equity
Medical, dental, and vision plans 100% paid for you and your dependents
Flexible PTO + 10 paid holidays per year
401(k) with match
16-week parental leave policy for birthing parent, 8 weeks for all other parents
HSA + FSA contributions
Life insurance, plus short and long-term disability coverage
Free daily lunch in-office
Annual learning stipend
Relocation assistance

Skills

TerraformAWSGCPPythonTypeScriptBashCI/CDObservabilityInfrastructure as CodeCloud SecuritySecrets ManagementIncident ResponseMonitoringLoggingAlerting

Similar roles at this salary range

All DevOps / SRE jobs →

Pindrop

Jun 24

Senior Manager, DevOps

Lead DevOps strategy and team to improve engineering velocity, platform reliability, and operational efficiency across multi-cloud (AWS/GCP) environments. Drive IaC, Kubernetes delivery, observability, AI-powered tooling adoption, and cross-functional collaboration.

155k – 185kUnited StatesDevOps / SRERemote6+ YOEGoAWS

Render

Jun 24

Software Engineer, Dev Velocity

Build internal developer platform, tooling, and automation to accelerate engineering velocity. Focus on CI/CD pipelines, test infrastructure, build systems, and metrics to help engineers ship faster and more reliably.

170k – 290kUnited StatesDevOps / SRERemote5+ YOEGoCI/CD

Okta

Jun 24

Senior Software Engineer, Observability

Senior engineer on the Auth0 Platform Observability team responsible for designing, building, and maintaining scalable observability infrastructure (metrics, logs, traces) using Datadog, Terraform, and OpenTelemetry.

147k – 202kBellevue, WA +3DevOps / SREHybrid5+ YOEAWSAzure

NMI

Jun 24

Senior MySQL Database Administrator

Senior DBA responsible for designing, maintaining, and improving MySQL database infrastructure in a high-availability SRE environment. Requires 5+ years MySQL/MariaDB experience and on-call participation.

130k – 160kUnited StatesDevOps / SRERemote5+ YOEMHAMySQL

Beacon AI

Jun 24

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud and LLM infrastructure powering retrieval-augmented generation, vector search, and ML pipelines for aviation AI systems. Requires strong AWS depth, Python data pipelines, and production LLM experience.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOES3AWS

Apply