Senior Staff Network Engineer, Automation
Senior technical leader owning Crusoe's network automation platform, source of truth, intent-based config systems, and self-healing workflows across hyperscale multi-vendor fabrics. Requires 12+ years of production network automation experience with deep expertise in Python/Go, model-driven telemetry, and observability at 10K+ device scale.
What You'll Be Working On
- Own the Network Automation Platform: Define the technical roadmap for Crusoe's automation stack, from source of truth and config generation through day-2 operations, telemetry, and closed-loop remediation across our global fleet.
- Build the Source of Truth: Design and own the authoritative data model (NetBox, Nautobot, or equivalent) that drives all network configuration, validation, and operational state across teams.
- Architect Intent-Based Configuration Systems: Lead the design and delivery of declarative, model-driven configuration pipelines using Python, Nornir, Ansible, Jinja2, and CI/CD, treating the network as code and making configuration drift impossible.
- Drive Model-Driven Automation: Own Crusoe's gNMI, OpenConfig, and NETCONF/YANG strategy for telemetry collection, configuration management, and state validation across multi-vendor fabrics (Arista, Juniper, NVIDIA/Mellanox).
- Build Self-Healing Workflows: Design and ship event-driven, auto-remediation systems that detect faults, correlate telemetry, and resolve known failure modes without human escalation.
- Define the Observability Platform: Set the technical direction for Crusoe's telemetry, metrics, alerting, and dashboarding stack including Prometheus, Grafana, and streaming gNMI.
- Influence Architecture for Automability: Partner with Network Architecture to ensure designs are automation-first from day one, deployable, validatable, and operable programmatically at scale.
- Mentor and Multiply: Provide technical guidance to Staff and Senior engineers. Drive code reviews, design reviews, and platform architecture decisions that raise the engineering bar across the org.
What You'll Bring to the Team
- 12+ years of network engineering experience with a demonstrated focus on production network automation, platform engineering, and infrastructure as code in hyperscale or internet-scale environments.
- Demonstrated Technical Leadership: Proven track record of designing and shipping network automation platforms used by a broader engineering organization, not scripts, but systems others build on.
- Production-Quality Software Engineering: Mastery of Python or Go at a platform level, testable, CI/CD-integrated, and production-owned.
- Model-Driven Automation Fluency: Deep hands-on experience with gNMI, OpenConfig, NETCONF, and YANG-modeled configuration and telemetry.
- Source of Truth Ownership: You have designed or owned a network source of truth platform (NetBox, Nautobot, or equivalent) end to end, including the data model, integrations, and CI/CD pipelines that consume it.
- Network Domain Depth: Strong hands-on expertise with Arista (EOS), Juniper (Junos), and NVIDIA/Mellanox platforms in leaf-spine architectures. Solid knowledge of BGP, EVPN-VXLAN, and LLDP at DC fabric scale.
- Event-Driven and Self-Healing Systems: Track record of building auto-remediation and closed-loop automation that detects, correlates, and resolves faults without human intervention.
- Observability Expertise: Hands-on experience building streaming telemetry and observability platforms using gNMI collectors, Prometheus, Grafana, and equivalent tooling at fleet scale.
- Hyperscale Operational Context: Comfort operating at scale across 10K+ network devices and multi-region fabric fleets.
Benefits
- Competitive compensation and equity packages
- Restricted Stock Units
- Paid time off, paid holidays & leave of absence programs
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off
- Global travel insurance & emergency assistance
- Daily meals allowance
- Additional perks & programs specific to location
Compensation will be paid in the range of up to $245,000 - $295,000 + Bonus. Restricted Stock Units are included in all offers.
Lead Site Reliability Engineer
Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.
Senior Software Engineer - Internal Observability
Senior engineer building AI-powered observability systems and large-scale telemetry pipelines for Snowflake's multi-cloud data platform. Requires 7+ years focused on distributed systems and cloud services.
Platform Engineer
Own AWS infrastructure, Pulumi IaC, deployment pipelines, and security baseline for an AI research platform serving financial institutions. First dedicated platform hire defining enterprise deployment, SOC 2 controls, and developer experience.
Principal Infrastructure Engineer
Principal Infrastructure Engineer building and operating secure cloud-native and edge platforms for military collaboration software. Requires 8+ years production infrastructure experience, deep Kubernetes expertise, and ability to obtain SECRET clearance.