Senior Site Reliability Engineer

130k – 140kUnited StatesRemote5+ YOEJun 10

Summary

Senior SRE responsible for incident response, infrastructure reliability, database operations, and scaling production systems on AWS and Kubernetes.

About the role

Responsibilities

Act as a first responder for system incidents and outages
Own and evolve monitoring, alerting, and log management systems
Manage and optimize database infrastructure (MySQL, Postgres, Clickhouse, Redis)
Maintain and improve server infrastructure and deployment pipelines
Collaborate with engineering teams to build scalable, resilient systems
Contribute to internal SRE tooling and automation efforts

Requirements

Deep expertise with AWS and Kubernetes
5+ years of experience in a Site Reliability, DevOps, or Infrastructure Engineering role
Proven experience scaling production systems in a high-growth environment
Practical experience using AI tools to improve engineering productivity
Experience scaling an early-stage product to 1M+ monthly active users
Experience managing incident response and production system outages
Hands-on experience with database operations and optimization
Familiarity with observability tooling, monitoring, and logging best practices
Based in North or South America (AMER region)

Nice-to-Haves

Experience with SOC2 compliance or building secure infrastructure
Experience with Clickhouse or similar technologies

Compensation & Benefits

$130,000 - $140,000 USD per year
Fully remote
35 days of PTO annually + paid sabbatical after 5 years
100% medical coverage for you and family (or reimbursement)
Parental leave
Home office stipend
Learning & development stipend
Annual bonus potential
Company retreats twice a year

Skills

AWSKubernetesMySQLPostgreSQLClickHouseRedisMonitoringAlertingLog ManagementObservability

Similar roles at this salary range

All DevOps / SRE jobs →

Komodo Health

Jun 12

Senior Data Engineer, Sentinel (Pacific Time Zone)

Senior Infrastructure Engineer building and operating AWS cloud infrastructure for healthcare data platform. Requires Python, Terraform, CI/CD expertise, and big data tools experience.

153k – 210kUnited StatesDevOps / SRERemote5+ YOEAWSVPC

Jun 12

Sr. Production Engineer, Solutions Engineering

Senior Production Engineer building AI agents, platforms, and automation to ensure reliability of Pinterest's large-scale distributed systems serving hundreds of millions of users.

140k – 288kChicago, IL +1DevOps / SRERemote5+ YOEGoAWS

Nuro

Jun 12

Software Reliability Engineer

Build and operate resilient systems for Nuro's autonomous vehicle fleet. Design pipelines, automation, and tools to improve reliability and reduce operational toil. Join on-call rotation and lead investigations.

109k – 163kMountain View, CADevOps / SREOn-siteGoC++

Chime

Jun 12

Software Engineer, Infrastructure

Build and operate foundational data infrastructure including Airflow, Flink, DynamoDB, and RDS using Terraform and Kubernetes. Requires 2-4 years of infrastructure/platform experience and strong Python skills.

133k – 184kUnited StatesDevOps / SRERemote2+ YOEAWSRDS

Retool

Jun 11

Software Engineer, Developer Experience

Build internal AI tools and autonomous agents that embed into Retool's engineering workflows to boost developer productivity and reduce toil. Requires shipping real AI-powered developer tools and infrastructure.

155k – 315kSan Francisco, CADevOps / SREHybrid5+ YOELLMsAI agents

Apply