Skip to content

Staff Engineer, Command Center Insights & Actions

210k – 255kSan Francisco, CABackend EngineeringOnsite5+ YOE
Summary

Staff Engineer owning detection systems for Crusoe's Command Center platform. Defines heuristics, thresholds, and anomaly detection rules that translate infrastructure telemetry into actionable signals. Ships production features in Go/Rust/C++/Java with 5+ years experience.

About the role

What You'll Be Working On

Detection & Intelligence Ownership

  • Own the full detection stack — heuristics, threshold calibration, precision/recall tuning, and the rule systems that define what "something is wrong" means for the platform

Anomaly Detection Pipelines

  • Design and maintain detection systems including straggler node detection, GPU health signals, and fleet-level behavioral baselines

Signal Calibration

  • Drive detection fidelity by reducing false positives, increasing signal coverage, and building feedback loops that keep thresholds accurate as the fleet grows

ML/RL Integration

  • Evaluate and integrate machine learning and reinforcement learning techniques where they outperform rule-based approaches — and know when not to reach for a model

Product Engineering

  • Ship customer-facing features end-to-end across the CCIA stack — alert rule engine, control plane APIs, automated action systems, and insights delivery surfaces

0-to-1 & Scale

  • Build new systems from scratch and scale existing ones to support Crusoe's rapidly growing global fleet

Cross-Functional Collaboration

  • Work closely with product counterparts to shape requirements early and partner with the data science team to develop and validate detection models

System Design

  • Participate in design discussions across teams, contribute architectural perspective, and help evaluate technical trade-offs

Technical Mentorship

  • Mentor engineers at all levels through code review, design feedback, and direct coaching, and contribute to hiring by helping define what great looks like

What You'll Bring to the Team

Anomaly Detection & Heuristics Expertise

  • Deep experience building anomaly detection systems, heuristics-based rule engines, or ML/RL systems for infrastructure or data-intensive domains

Threshold & Signal Calibration

  • Demonstrated ability to reason about precision/recall trade-offs and build feedback loops that keep detection systems accurate over time

Distributed Systems Fundamentals

  • Strong foundations in the building blocks of reliable, scalable backend systems

Full Software Engineering Craft

  • 5+ years shipping production software; experience with modern compiled or systems languages (Go, Rust, C++, Java, or similar)

Data & Observability Fluency

  • Comfortable with time-series data, telemetry pipelines, and observability primitives

Communication

  • Ability to explain detection logic, trade-offs, and system behavior clearly to both engineers and non-technical partners

Force Multiplier Mindset

  • Make the team better through mentorship, clear technical vision, and genuine investment in the people around you

Bonus Points

  • Experience with GPU profiling tools (Nsight, NCCL Inspector) or hardware-level infrastructure diagnostics
  • Background in observability platforms or products
  • Experience with reinforcement learning applied to operational or infrastructure problems
  • Familiarity with large-scale fleet management or cloud infrastructure
  • Passion for building team culture and engineering quality of life

Benefits

  • Competitive compensation and equity packages
  • Restricted Stock Units
  • Paid time off, paid holidays & leave of absence programs
  • Comprehensive health, dental & vision insurance
  • Employer contributions to HSA account
  • Paid parental leave
  • Paid life insurance, short-term and long-term disability
  • Professional development & tuition reimbursement
  • Mental health & wellness support
  • Commuter benefits (parking & transit)
  • Cell phone stipend
  • 401(k) Retirement plan with company match up to 4% of salary
  • Volunteer time off
  • Global travel insurance & emergency assistance
  • Daily meals allowance
Skills
GoRustC++JavaAnomaly DetectionMachine LearningReinforcement LearningDistributed SystemsTime-series DataTelemetry PipelinesObservabilityGPU ProfilingNsightNCCL
Similar roles at this salary range
All Backend Engineering jobs →
Zoox

Senior/Staff Software Engineer - C++ Simulation Platform

Build and optimize the high-speed GPU-based C++ simulation platform that powers Zoox's autonomous vehicle ML training and validation at scale.

245k – 305kFoster City, CA +1Backend EngineeringHybrid6+ YOEC++JAX
Prompt Health

Senior Healthcare Integrations Software Engineer

Senior engineer building scalable EDI/API healthcare integrations for B2B SaaS. Requires 5+ years experience, strong API skills, and proficiency in PHP or Python.

170k – 200kUnited StatesBackend EngineeringRemote5+ YOEPHPEDI
xAI

Member of Technical Staff

Write and maintain production backend services, build scalable frameworks, and deploy infrastructure using Java, Python, React, Docker, and Kubernetes. Requires 1 year of experience with LLMs, recommendation systems, and probabilistic modeling.

180k – 220kPalo Alto, CABackend EngineeringOn-site1+ YOEJavaScala
Gusto

Principal Software Engineer, Money Group

Technical lead for Gusto's Money Group, guiding architecture and service migrations for financial infrastructure including payments, lending, and accounts. Requires 10+ years experience building regulated financial systems at scale.

189k – 278kDenver, CO +2Backend EngineeringHybrid10+ YOECard SystemsRuby on Rails
Zocdoc

Staff Software Engineer, Spend Management

Staff Software Engineer leading architecture and development of a self-service spend management platform for healthcare practices. Requires 2+ years .NET and AWS experience plus strong backend and frontend skills.

176k – 264kNew York, NYBackend EngineeringHybrid7+ YOEAWS.NET