Software Engineer, Infrastructure
Builds and operates large-scale infrastructure including GPU clusters, Kubernetes orchestration, AWS batch jobs, and observability tooling to power AI search systems. Requires experience with massive-scale systems and focus on reliability and optimization.
Desired Experience
- Experience designing and operating large-scale infrastructure - GPU clusters or large Kubernetes clusters or cloud batchjob systems
- Obsessive mindset — always thinking about reliability, observability, and optimization across the entire stack
Example Projects
- Build the Kubernetes orchestration on a $20m GPU cluster
- Scale our AWS batchjob system to handle map reduce jobs over 10s of thousands of machines
- Design GPU scheduling software so we max out our cluster utilization
- Build observability into our production systems
Senior Manager, DevOps
Lead DevOps strategy and team to improve engineering velocity, platform reliability, and operational efficiency across multi-cloud (AWS/GCP) environments. Drive IaC, Kubernetes delivery, observability, AI-powered tooling adoption, and cross-functional collaboration.
Software Engineer, Dev Velocity
Build internal developer platform, tooling, and automation to accelerate engineering velocity. Focus on CI/CD pipelines, test infrastructure, build systems, and metrics to help engineers ship faster and more reliably.
Senior Software Engineer, Observability
Senior engineer on the Auth0 Platform Observability team responsible for designing, building, and maintaining scalable observability infrastructure (metrics, logs, traces) using Datadog, Terraform, and OpenTelemetry.
Senior Software Engineer - Developer Platform
Senior engineer building and scaling internal developer platforms with strong focus on AI tooling, reliability, and developer experience. Requires 4+ years in backend/infrastructure and proven project leadership.