Skip to content

Staff Software Engineer, Network Automation

215k – 260kSan Francisco, CASunnyvale, CASeattle, WADevOps / SREOnsite8+ YOE
Summary

Design and deliver automation frameworks, observability platforms, and self-healing workflows for Crusoe's global network fleet. Requires 8+ years network engineering experience with strong Python/Go skills and expertise in model-driven automation.

About the role

What You'll Be Working On

Network Automation Platform

  • Contribute to the technical roadmap for Crusoe's automation stack, from source of truth and config generation through day-2 operations and closed-loop remediation across our global fleet

Source of Truth

  • Help design and maintain the authoritative data model (NetBox, Nautobot, or equivalent) that drives network configuration, validation, and operational state across teams

Intent-Based Configuration Pipelines

  • Build and maintain declarative, model-driven configuration systems using Python, Nornir, Ansible, and Jinja2, treating the network as code and eliminating configuration drift

Model-Driven Automation

  • Contribute to Crusoe's gNMI, OpenConfig, and NETCONF/YANG strategy for telemetry collection, configuration management, and state validation across multi-vendor fabrics

Self-Healing Workflows

  • Build and maintain event-driven auto-remediation systems that detect faults, correlate telemetry, and resolve known failure modes without human escalation

Observability Platform

  • Help build and improve Crusoe's telemetry, metrics, alerting, and dashboarding stack including Prometheus, Grafana, and streaming gNMI collectors

Architecture Partnership

  • Work closely with Network Architecture to ensure designs are automation-first — deployable, validatable, and operable programmatically at scale

What You'll Bring to the Team

  • 8+ years of network engineering experience with a demonstrated focus on production network automation and infrastructure as code
  • Production-quality software engineering skills in Python or Go, with CI/CD integration and platform-level thinking
  • Hands-on experience with model-driven automation including gNMI, OpenConfig, NETCONF, and YANG
  • Experience contributing to or owning a network source of truth platform such as NetBox or Nautobot
  • Strong knowledge of Arista (EOS) and/or Juniper (Junos) in leaf-spine DC fabric environments
  • Solid understanding of BGP, EVPN-VXLAN, and LLDP at data center scale
  • Experience building or contributing to observability platforms using Prometheus, Grafana, and streaming telemetry tooling

Bonus Points

  • Experience with NVIDIA/Mellanox platforms in production environments
  • Familiarity operating at fleet scale across thousands of network devices in multi-region environments
  • Exposure to closed-loop, event-driven automation and auto-remediation systems
  • Experience in hyperscale or internet-scale infrastructure (cloud providers, large CDNs, or AI/ML infrastructure companies)

Benefits

  • Competitive compensation and equity packages
  • Restricted Stock Units
  • Paid time off, paid holidays & leave of absence programs
  • Comprehensive health, dental & vision insurance
  • Employer contributions to HSA account
  • Paid parental leave
  • Paid life insurance, short-term and long-term disability
  • Professional development & tuition reimbursement
  • Mental health & wellness support
  • Commuter benefits (parking & transit)
  • Cell phone stipend
  • 401(k) Retirement plan with company match up to 4% of salary
  • Volunteer time off
  • Global travel insurance & emergency assistance
  • Daily meals allowance
  • Additional perks & programs specific to location
Skills
PythonGoNornirAnsibleJinja2gNMIOpenConfigNETCONFYANGNetBoxNautobotArista EOSJuniper JunosBGPEVPN-VXLAN
Similar roles at this salary range
All DevOps / SRE jobs →
Plaid

Staff Site Reliability Engineer, Release Engineering

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO
Fivetran

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP
Dropbox

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++
Okta

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Cribl

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3