Skip to content

Staff Network Deployment Engineer, Lab

Leads physical and logical deployment of high-performance networks for GPU compute lab clusters, including build-outs, testing, automation, troubleshooting, and maintenance. Requires 8+ years in data center network engineering, expertise in Arista/Juniper/NVIDIA platforms, BGP/EVPN-VXLAN, Python/Ansible.

193k – 234kSan Francisco, CADevOps / SREOnsite8+ YOE

About the role

What You’ll Be Working On

  • Execute new product build-outs: Lead the end-to-end deployment of network infrastructure for our Crusoe lab.
  • Bridge Design and Reality: Take high-level designs from the Network Development team and translate them into implementation plans, cable maps, and configuration templates.
  • Validate new equipment: Perform rigorous "Burn-in" testing acceptance testing (SAT) for new network clusters.
  • Support optimizing deployment automation: Use Python, Ansible, and ZTP (Zero Touch Provisioning) to automate the staging and configuration of network devices.
  • Lab management: Work with remote hands, cabling vendors, and to ensure physical layer standards (fiber paths, power requirements, and cooling) meet Crusoe’s stringent HPC requirements.
  • Manage a lab environment consisting of the latest GPU systems and networking devices.
  • Working closely with OEM for Networking and data center infrastructure components.
  • Perform diagnosis and troubleshooting of hardware faults for network systems.
  • Support networking for GPU platforms including NVIDIA A100, H200, GB200, B200, B300 and AMD 350X / 355X.
  • Execute component-level diagnosis and remediation for failed or degraded network hardware.
  • Partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for networking systems.
  • Perform firmware and BIOS upgrades.
  • Maintain detailed documentation of maintenance activities, failures, and resolutions in ticketing and asset management systems.
  • For lab operations, develop and update standard operating procedures (SOPs) for troubleshooting, repair, and validation workflows.
  • Collaborate with engineering, software, and data center operations teams to identify root causes of systemic failures and document remediations.
  • Participate in a rotating infrastructure on-call schedule (about one week every 4–6 weeks) with daytime coverage and handoff to the Europe team.

What You’ll Bring to the Team

  • 8+ years of experience in network engineering with a heavy focus on large-scale data center deployments and infrastructure projects.
  • Mastery of Physical Layer Standards: Expert knowledge of structured cabling (SMF/MMF, MPO/MTP), optical transceivers (400G/800G), and data center power/cooling requirements.
  • Strong Routing and Switching Knowledge: Hands-on experience configuring Arista (EOS), Juniper (Junos), and NVIDIA/Mellanox platforms in a leaf-spine architecture.
  • Protocol Proficiency: Solid understanding of BGP, EVPN-VXLAN as they relate to large-scale fabric provisioning.
  • Automation-First Mindset: Proficiency in Python and Ansible for automating repetitive deployment tasks and validating configuration state.
  • Logistical Excellence: Proven ability to manage multiple complex projects simultaneously across different time zones and physical locations.
  • Troubleshooting Expertise: Ability to diagnose complex physical layer and link-layer issues using OTDRs, light meters, and packet captures.
  • Education: Bachelor’s degree in a technical field or equivalent practical experience in hyperscale or ISP environments.

Compensation

Compensation will be paid in the range of $193,000 - $234,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Skills

Arista EosJuniper JunosNvidia MellanoxBGPEvpn-VxlanPythonAnsibleZero Touch ProvisioningSmfMmfMpoMtpOtdrNvidia A100Nvidia H200

Similar roles

DevOps / SRE jobs

Staff Network Engineer, Deployment

Leads physical and logical deployment of global network infrastructure for AI data centers, including rack/stack, cabling, automation with Python/Ansible, testing, and partner coordination. Requires 8+ years experience with Arista, Juniper, Mellanox, BGP/EVPN, and physical layer expertise.

193k – 234kSan Francisco, CA +2DevOps / SREOn-site8+ YOEBGPSmf

Staff/Senior Software Engineer, Offboard Infrastructure

Staff/Senior Software Engineer building data platforms, simulation systems, or technical infrastructure for autonomous driving technology. Requires 5+ years experience, strong Python/C++/Go skills, and expertise in distributed systems or related areas.

194k – 352kMountain View, CADevOps / SREOn-site5+ YOEGoC++

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE

Staff Platform Engineer - Infra + DevOps

Seasoned Platform Engineer designing and maintaining scalable distributed systems and infrastructure on AWS. Builds foundational patterns, IaC, CI/CD pipelines, and observability for Python services using Kubernetes and serverless. Requires 6+ years of platform engineering experience.

194k – 220kUnited StatesDevOps / SRERemote6+ YOEAWSCI/CD

Staff Software Engineer, Developer Foundations

Staff Software Engineer builds tools and automation for continuous delivery pipelines to boost developer productivity. Requires 5+ years in CI/CD, AWS, Docker, and experience with build tools like Gradle and Bazel.

194k – 267kSan Francisco, CADevOps / SREHybrid5+ YOEGoAWS