Staff Software Engineer, Network Automation

215k – 260kSan Francisco, CASunnyvale, CASeattle, WADevOps / SREOnsite8+ YOEJun 9

Summary

Design and deliver automation frameworks, observability platforms, and self-healing workflows for Crusoe's global network fleet. Requires 8+ years network engineering experience with strong Python/Go skills and expertise in model-driven automation.

About the role

What You'll Be Working On

Network Automation Platform

Contribute to the technical roadmap for Crusoe's automation stack, from source of truth and config generation through day-2 operations and closed-loop remediation across our global fleet

Source of Truth

Help design and maintain the authoritative data model (NetBox, Nautobot, or equivalent) that drives network configuration, validation, and operational state across teams

Intent-Based Configuration Pipelines

Build and maintain declarative, model-driven configuration systems using Python, Nornir, Ansible, and Jinja2, treating the network as code and eliminating configuration drift

Model-Driven Automation

Contribute to Crusoe's gNMI, OpenConfig, and NETCONF/YANG strategy for telemetry collection, configuration management, and state validation across multi-vendor fabrics

Self-Healing Workflows

Build and maintain event-driven auto-remediation systems that detect faults, correlate telemetry, and resolve known failure modes without human escalation

Observability Platform

Help build and improve Crusoe's telemetry, metrics, alerting, and dashboarding stack including Prometheus, Grafana, and streaming gNMI collectors

Architecture Partnership

Work closely with Network Architecture to ensure designs are automation-first — deployable, validatable, and operable programmatically at scale

What You'll Bring to the Team

8+ years of network engineering experience with a demonstrated focus on production network automation and infrastructure as code
Production-quality software engineering skills in Python or Go, with CI/CD integration and platform-level thinking
Hands-on experience with model-driven automation including gNMI, OpenConfig, NETCONF, and YANG
Experience contributing to or owning a network source of truth platform such as NetBox or Nautobot
Strong knowledge of Arista (EOS) and/or Juniper (Junos) in leaf-spine DC fabric environments
Solid understanding of BGP, EVPN-VXLAN, and LLDP at data center scale
Experience building or contributing to observability platforms using Prometheus, Grafana, and streaming telemetry tooling

Bonus Points

Experience with NVIDIA/Mellanox platforms in production environments
Familiarity operating at fleet scale across thousands of network devices in multi-region environments
Exposure to closed-loop, event-driven automation and auto-remediation systems
Experience in hyperscale or internet-scale infrastructure (cloud providers, large CDNs, or AI/ML infrastructure companies)

Benefits

Competitive compensation and equity packages
Restricted Stock Units
Paid time off, paid holidays & leave of absence programs
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off
Global travel insurance & emergency assistance
Daily meals allowance
Additional perks & programs specific to location

Skills

PythonGoNornirAnsibleJinja2gNMIOpenConfigNETCONFYANGNetBoxNautobotArista EOSJuniper JunosBGPEVPN-VXLAN

Similar roles at this salary range

All DevOps / SRE jobs →

Plaid

Jun 19

Staff Site Reliability Engineer, Release Engineering

Staff SRE on the Release Engineering team defining and scaling reliability practices, architecting SLO/error-budget programs, and driving progressive delivery and automated safety gates across product engineering.

208k – 274kNew York, NYDevOps / SREHybrid8+ YOEGoSLO

Fivetran

Jun 18

Senior Site Reliability Engineer

Senior SRE responsible for production infrastructure reliability, incident response, deployment automation, and scaling SaaS systems on Kubernetes and major cloud platforms.

175k – 210kOakland, CADevOps / SREHybrid5+ YOEAWSGCP

Dropbox

Jun 18

Senior Infrastructure Software Engineer, Storage Core

Senior engineer building and operating Dropbox's exabyte-scale distributed storage systems. Focus on replication, erasure coding, performance, and reliability in Go/Rust.

180k – 274kUnited StatesDevOps / SRERemote9+ YOEGoC++

Okta

Jun 17

Staff Site Reliability Engineer - Observability

Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.

194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE

Cribl

Jun 17

Sr Software Engineer, Storage

Senior Software Engineer on the Storage team building autoscaling, self-healing infrastructure-as-code systems that manage petabyte-scale telemetry storage on AWS.

175k – 205kUnited StatesDevOps / SRERemote5+ YOEGoS3

Apply