Skip to content

Staff / Senior Software Engineer, Infrastructure

Builds and operates scalable infrastructure systems including Kubernetes clusters, distributed databases, and cloud services to support AI music platform at consumer scale. Requires 5+ years experience in infrastructure engineering with strong ownership and scaling expertise.

220k – 280kCambridge, MADevOps / SREOnsite5+ YOE

About the role

What You’ll Do

  • Architect and build services to handle massive consumer traffic, data, and usage
  • Design systems that are performant, secure, scalable, and easy to observe
  • Own systems end-to-end — from design and implementation through deployment, monitoring, and operational excellence
  • Lead by example on engineering excellence — code quality, system design, documentation, and operational maturity
  • Collaborate with engineering teams across the company to understand their needs and build the right abstractions
  • Communicate proactively and with high bandwidth — concise and information-dense async updates, stakeholder alignment, low entropy
  • Operate with ambiguity — scope what matters, make tradeoffs, and drive projects forward independently

What You’ll Need (Required)

  • 5–7+ years of infrastructure, backend, or systems engineering experience
  • Experience building and operating systems at significant scale in production
  • Strong understanding of distributed systems, cloud services (AWS/GCP), and modern infrastructure patterns
  • Experience with some combination of: Kubernetes, Docker, infrastructure as code (Pulumi/Terraform/CDK), databases (Postgres, distributed relational databases), caching systems, or container orchestration
  • Ability to reason through hard scaling, reliability, and performance problems with clear technical judgment
  • High ownership — you drive projects end-to-end without waiting for direction
  • Strong communication skills — you keep stakeholders informed and reduce ambiguity for the teams you serve
  • An obsession with engineering excellence, iterating and learning rapidly, and working hard

Strong Plusses

  • Deep experience with Kubernetes at scale — cluster management, control plane scaling, multi-tenancy
  • Experience with large-scale databases, distributed data layers, or storage systems
  • Experience with ML infrastructure — inference serving, ML data pipelines, MLOps, GPU infrastructure
  • Experience on a platform or developer experience team where your primary customers were other engineers
  • Experience building internal systems 0→1 (auth, notifications, CDN, DevEx tooling, or similar)
  • Strong oncall instincts — triage, debug, and resolve incidents across a distributed stack
  • Hands-on familiarity with AI tooling and the current landscape of AI for software engineering — models, agents, coding assistants, and agentic workflows
  • Golang or Rust experience, especially for large-scale systems
  • Experience with websockets, CDNs, streaming traffic patterns, and audio/video delivery
  • Security best practices in building and scaling infrastructure
  • Technical leadership or management experience

Skills

KubernetesDockerAWSGCPTerraformPulumiPostgresGoRustDistributed Systems

Similar roles

DevOps / SRE jobs

Member of Technical Staff

This role is for a Software Engineer on the Cloud Infrastructure team, focusing on designing, building, and operating foundational cloud primitives and deployment models. The engineer will own the roadmap and technical strategy for agent-driven cloud infrastructure management, ensuring secure and scalable solutions for various customer environments.

220k – 405kSan Francisco, CA +2DevOps / SREOn-site7+ YOEGoAWS

Staff Infrastructure Engineer

As a Staff Infrastructure Engineer, you will ensure the reliability, scalability, and performance of Replit's infrastructure. You will drive automation, optimize performance, elevate developer experience, and mentor the engineering team on best practices for resilient systems.

220k – 325kFoster City, CADevOps / SREHybrid8+ YOEGoGCP

Staff Platform Engineer, Voice AI

Staff Platform Engineer owning architecture, reliability, and autoscaling for Together AI's real-time Voice AI API layer. Requires 8+ years building large-scale distributed streaming systems, deep Kubernetes and TypeScript/Python expertise.

220k – 280kSan Francisco, CADevOps / SREOn-site8+ YOERustReact

Staff Software Engineer, Managed Orchestration (Managed Kubernetes)

Staff Software Engineer designs, builds, and scales managed Kubernetes and AI training clusters, focusing on reliability, performance, and orchestration using Go, Terraform, and GCP. Oversees architecture, CI/CD pipelines, and critical infrastructure projects requiring 8+ years experience.

220k – 250kSan Francisco, CA +1DevOps / SREOn-site8+ YOEGoGCP

Staff Platform Engineer

Staff Platform Engineer builds and owns core infrastructure platform using AWS services and IaC tools, sets architectural direction, leads security/reliability, and mentors engineers. Requires 7+ years experience with AWS, TypeScript/Node.js, and startup velocity.

220k – 280kSan Francisco, CADevOps / SREOn-site7+ YOES3AWS