Software Engineer, Infrastructure

180k – 350kSan Francisco, CADevOps / SREOnsiteSep 3

Summary

Builds and operates large-scale infrastructure including GPU clusters, Kubernetes orchestration, AWS batch jobs, and observability tooling to power AI search systems. Requires experience with massive-scale systems and focus on reliability and optimization.

About the role

Desired Experience

Experience designing and operating large-scale infrastructure - GPU clusters or large Kubernetes clusters or cloud batchjob systems
Obsessive mindset — always thinking about reliability, observability, and optimization across the entire stack

Example Projects

Build the Kubernetes orchestration on a $20m GPU cluster
Scale our AWS batchjob system to handle map reduce jobs over 10s of thousands of machines
Design GPU scheduling software so we max out our cluster utilization
Build observability into our production systems

Skills

KubernetesRustAWSRayGPUObservabilityMapReduce

Similar roles at this salary range

All DevOps / SRE jobs →

Pindrop

Jun 24

Senior Manager, DevOps

Lead DevOps strategy and team to improve engineering velocity, platform reliability, and operational efficiency across multi-cloud (AWS/GCP) environments. Drive IaC, Kubernetes delivery, observability, AI-powered tooling adoption, and cross-functional collaboration.

155k – 185kUnited StatesDevOps / SRERemote6+ YOEGoAWS

Render

Jun 24

Software Engineer, Dev Velocity

Build internal developer platform, tooling, and automation to accelerate engineering velocity. Focus on CI/CD pipelines, test infrastructure, build systems, and metrics to help engineers ship faster and more reliably.

170k – 290kUnited StatesDevOps / SRERemote5+ YOEGoCI/CD

Airbnb

Jun 24

Senior Software Engineer, Dev Tools

Senior engineer building and operating cloud dev environments, Kubernetes platforms, and tooling for engineers and AI agents at Airbnb. Requires 5-9+ years building high-scale distributed systems on AWS.

196k – 230kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Okta

Jun 24

Senior Software Engineer, Observability

Senior engineer on the Auth0 Platform Observability team responsible for designing, building, and maintaining scalable observability infrastructure (metrics, logs, traces) using Datadog, Terraform, and OpenTelemetry.

147k – 202kBellevue, WA +3DevOps / SREHybrid5+ YOEAWSAzure

Pave

Jun 24

Senior Software Engineer - Developer Platform

Senior engineer building and scaling internal developer platforms with strong focus on AI tooling, reliability, and developer experience. Requires 4+ years in backend/infrastructure and proven project leadership.

196k – 265kSan Francisco, CA +1DevOps / SREHybrid4+ YOEGCPNode.js

Apply