Skip to content

Senior Staff Software Engineer, Managed Orchestration

238k – 288kSan Francisco, CAOnsite10+ YOE
Summary

Leads architecture and development of scalable managed Kubernetes and AI orchestration systems, providing technical direction for cloud infrastructure reliability and performance. Requires 10+ years in software engineering with deep expertise in Go, Kubernetes, and large-scale systems.

About the role

What You'll Be Working On

  • Drive the development of scalable, resilient, and high-performance software solutions, ensuring alignment with and influence over the strategic objectives outlined in the Crusoe Cloud roadmap
  • Provide technical leadership across multiple teams, fostering a culture of innovation, engineering excellence, and accountability while enabling teams to deliver cutting-edge cloud solutions
  • Define and evolve architectural standards and best practices, ensuring consistency, scalability, and long-term maintainability across systems
  • Continuously stay ahead of emerging trends and technologies in cloud software, proactively shaping Crusoe's technical direction and incorporating innovations that maintain competitive advantage
  • Act as a mentor and multiplier for engineering talent, elevating team capabilities through coaching, design reviews, and thought leadership in technical discussions
  • Lead cross-functional initiatives and drive alignment between engineering, product, and infrastructure teams to deliver cohesive and impactful solutions

What You'll Bring to the Team

  • 10+ years of experience working in software engineering, with deep expertise in Systems Engineering and large-scale distributed systems
  • 3+ years of programming experience in GoLang, with a track record of delivering production-grade systems
  • Extensive experience with Kubernetes and Linux Engineering, including advanced debugging and performance optimization
  • Highly skilled in infrastructure as code and have a strong understanding of complex systems-level challenges at scale
  • Experience with Terraform and GCP (preferred), with the ability to influence platform-level decisions
  • Strong understanding of Argo, CI/CD, and Automated Testing pipelines, including designing and scaling them for large organizations
  • Can architect, build, and evolve Kubernetes operators and controllers, owning critical components that ensure the reliability, scalability, and efficiency of the Kubernetes environment
  • Experience designing and operating large-scale systems comparable to leading services like Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
  • Can lead and deliver critical, high-impact projects, driving initiatives across networking, quality control, automation, and system reliability at an organizational level
  • Can define and own system architecture end-to-end, including CI/CD pipelines, ensuring scalability, security, and long-term sustainability
  • Exceptional communication skills, with the ability to influence technical and non-technical stakeholders and drive alignment across the organization

Compensation

Compensation will be paid in the range of up to $237,600 - $288,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data.

Skills
GoKubernetesLinuxTerraformGCPArgoCI/CDKubernetes OperatorsInfrastructure as CodeDistributed Systems
Similar roles at this salary range
All DevOps / SRE jobs →
Crusoe

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit
Stuut

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS
Crusoe

Staff Network Engineer, Operations

Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.

195k – 235kSan Francisco, CADevOps / SREOn-siteBGPQoS
Snowflake

Senior Software Engineer - Internal Observability

Senior engineer building AI-powered observability systems and large-scale telemetry pipelines for Snowflake's multi-cloud data platform. Requires 7+ years focused on distributed systems and cloud services.

200k – 288kMenlo Park, CADevOps / SREOn-siteC++AWS
Kepler

Platform Engineer

Own AWS infrastructure, Pulumi IaC, deployment pipelines, and security baseline for an AI research platform serving financial institutions. First dedicated platform hire defining enterprise deployment, SOC 2 controls, and developer experience.

200k – 280kNew York, NYDevOps / SREOn-siteAWSCDK