Staff Infrastructure Engineer
Staff Infrastructure Engineer manages cloud infrastructure operations, develops automation for server provisioning, scales deployments, troubleshoots GPU hardware, and leads Kubernetes transition. Requires strong Linux, hardware, and Kubernetes expertise.
What You'll Be Doing
- Manage and maintain day-to-day operations of Crusoe’s cloud infrastructure.
- Develop automation tools to streamline server provisioning and reduce SLA times.
- Scale infrastructure to support mass deployments (80-100 servers simultaneously).
- Troubleshoot hardware issues, especially with GPUs, and liaise with vendors.
- Transition Crusoe’s environment to Kubernetes and containerized workflows.
What You’ll Bring to the Team
- Solid hardware experience and GPU troubleshooting expertise.
- Strong Linux background.
- Knowledge of PXE booting and server provisioning (bare metal).
- Experience with BMC/IPMI, BIOS, and enterprise-grade server management.
- Kubernetes proficiency (admin or developer).
- Familiarity with containerization technologies (Docker preferred).
- Experience with version control systems (Gitlab).
Nice to haves:
- Experience with MAAS.
- Proficiency in Python or Golang (preferred language).
- Kubernetes administration and deployment experience.
- Experience with Ansible and Terraform.
Compensation
$208,000 - $253,000 + Bonus. Restricted Stock Units are included in all offers.
Lead Site Reliability Engineer
Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.
Staff Network Engineer, Operations
Staff-level network operations engineer responsible for production reliability, incident response, and operational excellence across Crusoe's global edge, backbone, data center, and GPU cluster networks supporting AI workloads.