Senior Site Reliability Engineer

As a Senior Site Reliability Engineer, you will build and shape the foundation for reliability, observability, and scalability across David AI's infrastructure. You will own the observability stack, design scalable cloud infrastructure, and lead improvements in deployment pipelines and incident response.

160k – 220kSan Francisco, CADevOps / SREOnsite5+ YOE

Apply

About the role

About this role

As a Senior Site Reliability Engineer at David AI, you will shape and build the foundation for reliability, observability, and scalability across David AI's infrastructure. Working closely with our engineering and product teams, you’ll help ensure our systems are resilient, efficient, and designed to scale as the company grows.

In this role, you will

Own David AI’s observability stack, including monitoring, alerting, logging, and tracing, to provide engineers with clear visibility into system health, reliability, and performance.
Partner closely with product and platform engineering teams to design systems that are scalable, resilient, and reliable from day one, not as an afterthought.
Design and implement secure, scalable cloud infrastructure across AWS using Terraform and modern DevOps tooling to support rapid product and research iteration.
Lead improvements across deployment pipelines, CI/CD systems, and incident response processes to reduce downtime, improve operational efficiency, and strengthen engineering velocity.
Define and evolve the foundation of SRE practices at David AI, influencing reliability culture, tooling standards, operational excellence, and best practices across the engineering organization.

Your background looks like

5+ years of experience in Site Reliability, Infrastructure, or Platform Engineering supporting large-scale SaaS or cloud systems.
Hands-on experience applying Security best practices in production systems and cloud infrastructure.
Strong experience building and running reliable, highly available, and scalable systems.
Hands-on experience with AWS, Terraform, containers (like Kubernetes), and cloud networking basics.
Experience implementing and maintaining observability tooling across monitoring, logging, alerting, and tracing (e.g., Prometheus, Grafana, Datadog, or similar).
Comfortable working in fast-paced teams and collaborating closely with product, ML, and engineering teams.
Bachelor’s degree in Computer Science or related field, or equivalent practical experience.

Bonus points if you have

Past experience in an early-stage startup environment, especially defining SRE culture and tooling from scratch.
Familiarity with incident management automation or self-healing infrastructure patterns.

Some technologies we work with

Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Temporal, WebRTC, FFmpeg.

Benefits

Unlimited PTO.
Top-notch health, dental, and vision coverage with 100% coverage for most plans.
FSA & HSA access.
401k access.
Meals 2x daily through DoorDash + snacks and beverages available at the office.
Unlimited company-sponsored Barry’s classes.

Skills

AWSTerraformKubernetesPrometheusGrafanaDatadogNext.jsTypeScriptNode.jsPostgres

Similar roles

DevOps / SRE jobs

Octus

Lead DevOps Engineer

Lead a team of DevOps engineers to design, implement, and maintain CI/CD pipelines, cloud infrastructure, monitoring, and security best practices. Requires 7+ years of DevOps experience including 2 years in leadership.

160k – 215kNew York, NYDevOps / SREOn-site7+ YOES3AWS

Metropolis

Senior Central Cloud Infrastructure Engineer

Senior engineer responsible for architecting and maintaining scalable AWS cloud infrastructure, leading modernization initiatives, and ensuring PCI/SOC2 compliance. Requires 5+ years experience with Terraform, Kubernetes, observability, and production cloud systems.

160k – 200kNew York, NYDevOps / SREOn-site5+ YOEAWSEKS

Turion Space

Senior Cloud Engineer

Architect and scale secure Azure cloud infrastructure supporting spacecraft control systems and autonomous satellite operations. Requires 5+ years in cloud/SRE/DevOps, deep Azure expertise, IAM proficiency, and compliance framework experience.

160k – 213kIrvine, CADevOps / SREOn-site5+ YOEGoAks

Huntress

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemote7+ YOEGoRuby

Socure

Senior Software Engineer - SRE

As a Senior Site Reliability Engineer, you will own the end-to-end reliability and scalability of AWS infrastructure and Kubernetes platforms. This role involves designing, operating, and continuously improving production systems with a strong focus on automation and observability.

160k – 180kCarson City, NV +3DevOps / SREHybrid5+ YOEGoAWS