Skip to content

Staff Site Reliability Engineer

150k – 210kUnited StatesDevOps / SRERemote7+ YOE
Summary

Founding Staff SRE for Kong's internal developer platform (Volcano). Define reliability posture, build multi-region Kubernetes infrastructure, establish GitOps/CI-CD, and scale managed data services.

About the role

What You'll Do

  • Own reliability for Volcano end-to-end: Define and drive SLOs, error budgets, and incident response practices for all Volcano services — edge deployments, managed Postgres, auth, realtime, storage, and the control plane.
  • Architect the platform's infrastructure: Design and build the multi-region Kubernetes infrastructure, networking, and data plane that powers Volcano's edge deployment pipeline and backend-as-a-service capabilities.
  • Build the GitOps and CI/CD backbone: Establish deployment automation, canary pipelines, and preview environment provisioning using ArgoCD, Helm, and Terraform/Terragrunt — setting patterns the broader team will follow.
  • Scale managed data services: Design, operate, and harden multi-tenant PostgreSQL clusters, Redis caching layers, and object storage — with a focus on data isolation, performance, and disaster recovery.
  • Drive observability from day one: Instrument every Volcano service with meaningful SLIs; build dashboards, alerts, and runbooks using Datadog, Prometheus, and Grafana before services go live, not after incidents.
  • Lead cross-functional reliability work: Collaborate with the OCTO team, product engineering, and security to bake reliability and compliance into Volcano's architecture — not bolt it on later.
  • Set SRE culture and standards: Mentor engineers across Volcano's contributing teams on reliability principles; lead postmortems, define on-call practices, and build a blameless engineering culture.
  • Evaluate and adopt emerging technologies: Given Volcano's greenfield nature, evaluate and make architectural decisions on edge runtimes, serverless compute, vector databases, and AI-native infrastructure components.

What You'll Bring

  • BS in Computer Science or equivalent; substantial experience at Staff or Principal IC level in SRE/Platform Engineering.
  • Proven track record building SRE or platform engineering practices for developer-facing platforms or PaaS/SaaS products — ideally at greenfield stage.
  • Deep Kubernetes expertise: multi-tenant cluster design, networking (CNI, service mesh, ingress), autoscaling, and security hardening.
Skills
KubernetesSREPlatform EngineeringArgoCDHelmTerraformTerragruntPostgreSQLRedisDatadogPrometheusGrafanaGitOpsCI/CDMulti-region infrastructure
Similar roles at this salary range
All DevOps / SRE jobs →
Pindrop

Senior Manager, DevOps

Lead DevOps strategy and team to improve engineering velocity, platform reliability, and operational efficiency across multi-cloud (AWS/GCP) environments. Drive IaC, Kubernetes delivery, observability, AI-powered tooling adoption, and cross-functional collaboration.

155k – 185kUnited StatesDevOps / SRERemote6+ YOEGoAWS
Render

Software Engineer, Dev Velocity

Build internal developer platform, tooling, and automation to accelerate engineering velocity. Focus on CI/CD pipelines, test infrastructure, build systems, and metrics to help engineers ship faster and more reliably.

170k – 290kUnited StatesDevOps / SRERemote5+ YOEGoCI/CD
Okta

Senior Software Engineer, Observability

Senior engineer on the Auth0 Platform Observability team responsible for designing, building, and maintaining scalable observability infrastructure (metrics, logs, traces) using Datadog, Terraform, and OpenTelemetry.

147k – 202kBellevue, WA +3DevOps / SREHybrid5+ YOEAWSAzure
NMI

Senior MySQL Database Administrator

Senior DBA responsible for designing, maintaining, and improving MySQL database infrastructure in a high-availability SRE environment. Requires 5+ years MySQL/MariaDB experience and on-call participation.

130k – 160kUnited StatesDevOps / SRERemote5+ YOEMHAMySQL
Beacon AI

Software Engineer, Cloud Infrastructure

Build and operate AWS cloud and LLM infrastructure powering retrieval-augmented generation, vector search, and ML pipelines for aviation AI systems. Requires strong AWS depth, Python data pipelines, and production LLM experience.

135k – 260kSan Carlos, CADevOps / SREHybrid4+ YOES3AWS