Staff Infrastructure Engineer

208k – 253kSan Francisco, CASunnyvale, CAOnsiteMay 4

Summary

Staff Infrastructure Engineer manages cloud infrastructure operations, develops automation for server provisioning, scales deployments, troubleshoots GPU hardware, and leads Kubernetes transition. Requires strong Linux, hardware, and Kubernetes expertise.

About the role

What You'll Be Doing

Manage and maintain day-to-day operations of Crusoe’s cloud infrastructure.
Develop automation tools to streamline server provisioning and reduce SLA times.
Scale infrastructure to support mass deployments (80-100 servers simultaneously).
Troubleshoot hardware issues, especially with GPUs, and liaise with vendors.
Transition Crusoe’s environment to Kubernetes and containerized workflows.

What You’ll Bring to the Team

Solid hardware experience and GPU troubleshooting expertise.
Strong Linux background.
Knowledge of PXE booting and server provisioning (bare metal).
Experience with BMC/IPMI, BIOS, and enterprise-grade server management.
Kubernetes proficiency (admin or developer).
Familiarity with containerization technologies (Docker preferred).
Experience with version control systems (Gitlab).

Nice to haves:

Experience with MAAS.
Proficiency in Python or Golang (preferred language).
Kubernetes administration and deployment experience.
Experience with Ansible and Terraform.

Compensation

$208,000 - $253,000 + Bonus. Restricted Stock Units are included in all offers.

Skills

KubernetesLinuxDockerAnsibleTerraformPythonGolangGitLabIPMIPXE

Similar roles at this salary range

All DevOps / SRE jobs →

Crusoe

Jun 8

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit

Aurelian

Jun 8

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Stuut

Jun 8

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS

Crusoe

Jun 5

Staff Network Engineer, Operations

Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.

195k – 235kSan Francisco, CADevOps / SREOn-siteBGPQoS

Watershed

Jun 5

Software Engineer, Developer Tooling

Software engineer building developer tooling, AI automation, and test infrastructure to improve productivity and reliability for Watershed engineering teams.

174k – 230kSan Francisco, CADevOps / SREOn-siteCI/CDTemporal

Apply