Senior Staff Software Engineer, Managed Orchestration

238k – 288kSan Francisco, CAOnsite10+ YOEApr 15

Summary

Leads architecture and development of scalable managed Kubernetes and AI orchestration systems, providing technical direction for cloud infrastructure reliability and performance. Requires 10+ years in software engineering with deep expertise in Go, Kubernetes, and large-scale systems.

About the role

What You'll Be Working On

Drive the development of scalable, resilient, and high-performance software solutions, ensuring alignment with and influence over the strategic objectives outlined in the Crusoe Cloud roadmap
Provide technical leadership across multiple teams, fostering a culture of innovation, engineering excellence, and accountability while enabling teams to deliver cutting-edge cloud solutions
Define and evolve architectural standards and best practices, ensuring consistency, scalability, and long-term maintainability across systems
Continuously stay ahead of emerging trends and technologies in cloud software, proactively shaping Crusoe's technical direction and incorporating innovations that maintain competitive advantage
Act as a mentor and multiplier for engineering talent, elevating team capabilities through coaching, design reviews, and thought leadership in technical discussions
Lead cross-functional initiatives and drive alignment between engineering, product, and infrastructure teams to deliver cohesive and impactful solutions

What You'll Bring to the Team

10+ years of experience working in software engineering, with deep expertise in Systems Engineering and large-scale distributed systems
3+ years of programming experience in GoLang, with a track record of delivering production-grade systems
Extensive experience with Kubernetes and Linux Engineering, including advanced debugging and performance optimization
Highly skilled in infrastructure as code and have a strong understanding of complex systems-level challenges at scale
Experience with Terraform and GCP (preferred), with the ability to influence platform-level decisions
Strong understanding of Argo, CI/CD, and Automated Testing pipelines, including designing and scaling them for large organizations
Can architect, build, and evolve Kubernetes operators and controllers, owning critical components that ensure the reliability, scalability, and efficiency of the Kubernetes environment
Experience designing and operating large-scale systems comparable to leading services like Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
Can lead and deliver critical, high-impact projects, driving initiatives across networking, quality control, automation, and system reliability at an organizational level
Can define and own system architecture end-to-end, including CI/CD pipelines, ensuring scalability, security, and long-term sustainability
Exceptional communication skills, with the ability to influence technical and non-technical stakeholders and drive alignment across the organization

Compensation

Compensation will be paid in the range of up to $237,600 - $288,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data.

Skills

GoKubernetesLinuxTerraformGCPArgoCI/CDKubernetes OperatorsInfrastructure as CodeDistributed Systems

Similar roles at this salary range

All DevOps / SRE jobs →

Crusoe

Jun 8

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit

Stuut

Jun 8

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS

Crusoe

Jun 5

Staff Network Engineer, Operations

Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.

195k – 235kSan Francisco, CADevOps / SREOn-siteBGPQoS

Snowflake

Jun 5

Senior Software Engineer - Internal Observability

Senior engineer building AI-powered observability systems and large-scale telemetry pipelines for Snowflake's multi-cloud data platform. Requires 7+ years focused on distributed systems and cloud services.

200k – 288kMenlo Park, CADevOps / SREOn-siteC++AWS

Kepler

Jun 4

Platform Engineer

Own AWS infrastructure, Pulumi IaC, deployment pipelines, and security baseline for an AI research platform serving financial institutions. First dedicated platform hire defining enterprise deployment, SOC 2 controls, and developer experience.

200k – 280kNew York, NYDevOps / SREOn-siteAWSCDK

Apply