Skip to content

Software Engineer, Infrastructure

180k – 350kSan Francisco, CADevOps / SREOnsite
Summary

Builds and operates large-scale infrastructure including GPU clusters, Kubernetes orchestration, AWS batch jobs, and observability tooling to power AI search systems. Requires experience with massive-scale systems and focus on reliability and optimization.

About the role

Desired Experience

  • Experience designing and operating large-scale infrastructure - GPU clusters or large Kubernetes clusters or cloud batchjob systems
  • Obsessive mindset — always thinking about reliability, observability, and optimization across the entire stack

Example Projects

  • Build the Kubernetes orchestration on a $20m GPU cluster
  • Scale our AWS batchjob system to handle map reduce jobs over 10s of thousands of machines
  • Design GPU scheduling software so we max out our cluster utilization
  • Build observability into our production systems
Skills
KubernetesRustAWSRayGPUObservabilityMapReduce
Similar roles at this salary range
All DevOps / SRE jobs →
Pindrop

Senior Manager, DevOps

Lead DevOps strategy and team to improve engineering velocity, platform reliability, and operational efficiency across multi-cloud (AWS/GCP) environments. Drive IaC, Kubernetes delivery, observability, AI-powered tooling adoption, and cross-functional collaboration.

155k – 185kUnited StatesDevOps / SRERemote6+ YOEGoAWS
Render

Software Engineer, Dev Velocity

Build internal developer platform, tooling, and automation to accelerate engineering velocity. Focus on CI/CD pipelines, test infrastructure, build systems, and metrics to help engineers ship faster and more reliably.

170k – 290kUnited StatesDevOps / SRERemote5+ YOEGoCI/CD
Airbnb

Senior Software Engineer, Dev Tools

Senior engineer building and operating cloud dev environments, Kubernetes platforms, and tooling for engineers and AI agents at Airbnb. Requires 5-9+ years building high-scale distributed systems on AWS.

196k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS
Okta

Senior Software Engineer, Observability

Senior engineer on the Auth0 Platform Observability team responsible for designing, building, and maintaining scalable observability infrastructure (metrics, logs, traces) using Datadog, Terraform, and OpenTelemetry.

147k – 202kBellevue, WA +3DevOps / SREHybrid5+ YOEAWSAzure
Pave

Senior Software Engineer - Developer Platform

Senior engineer building and scaling internal developer platforms with strong focus on AI tooling, reliability, and developer experience. Requires 4+ years in backend/infrastructure and proven project leadership.

196k – 265kSan Francisco, CA +1DevOps / SREHybrid4+ YOEGCPNode.js