What You’ll Be Working On
- Execute new product build-outs: Lead the end-to-end deployment of network infrastructure for our Crusoe lab.
- Bridge Design and Reality: Take high-level designs from the Network Development team and translate them into implementation plans, cable maps, and configuration templates.
- Validate new equipment: Perform rigorous "Burn-in" testing acceptance testing (SAT) for new network clusters.
- Support optimizing deployment automation: Use Python, Ansible, and ZTP (Zero Touch Provisioning) to automate the staging and configuration of network devices.
- Lab management: Work with remote hands, cabling vendors, and to ensure physical layer standards (fiber paths, power requirements, and cooling) meet Crusoe’s stringent HPC requirements.
- Manage a lab environment consisting of the latest GPU systems and networking devices.
- Working closely with OEM for Networking and data center infrastructure components.
- Perform diagnosis and troubleshooting of hardware faults for network systems.
- Support networking for GPU platforms including NVIDIA A100, H200, GB200, B200, B300 and AMD 350X / 355X.
- Execute component-level diagnosis and remediation for failed or degraded network hardware.
- Partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for networking systems.
- Perform firmware and BIOS upgrades.
- Maintain detailed documentation of maintenance activities, failures, and resolutions in ticketing and asset management systems.
- For lab operations, develop and update standard operating procedures (SOPs) for troubleshooting, repair, and validation workflows.
- Collaborate with engineering, software, and data center operations teams to identify root causes of systemic failures and document remediations.
- Participate in a rotating infrastructure on-call schedule (about one week every 4–6 weeks) with daytime coverage and handoff to the Europe team.
What You’ll Bring to the Team
- 8+ years of experience in network engineering with a heavy focus on large-scale data center deployments and infrastructure projects.
- Mastery of Physical Layer Standards: Expert knowledge of structured cabling (SMF/MMF, MPO/MTP), optical transceivers (400G/800G), and data center power/cooling requirements.
- Strong Routing and Switching Knowledge: Hands-on experience configuring Arista (EOS), Juniper (Junos), and NVIDIA/Mellanox platforms in a leaf-spine architecture.
- Protocol Proficiency: Solid understanding of BGP, EVPN-VXLAN as they relate to large-scale fabric provisioning.
- Automation-First Mindset: Proficiency in Python and Ansible for automating repetitive deployment tasks and validating configuration state.
- Logistical Excellence: Proven ability to manage multiple complex projects simultaneously across different time zones and physical locations.
- Troubleshooting Expertise: Ability to diagnose complex physical layer and link-layer issues using OTDRs, light meters, and packet captures.
- Education: Bachelor’s degree in a technical field or equivalent practical experience in hyperscale or ISP environments.
Compensation
Compensation will be paid in the range of $193,000 - $234,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.