Staff Engineer, Command Center Insights & Actions
Staff Engineer owning detection systems for Crusoe's Command Center platform. Defines heuristics, thresholds, and anomaly detection rules that translate infrastructure telemetry into actionable signals. Ships production features in Go/Rust/C++/Java with 5+ years experience.
What You'll Be Working On
Detection & Intelligence Ownership
- Own the full detection stack — heuristics, threshold calibration, precision/recall tuning, and the rule systems that define what "something is wrong" means for the platform
Anomaly Detection Pipelines
- Design and maintain detection systems including straggler node detection, GPU health signals, and fleet-level behavioral baselines
Signal Calibration
- Drive detection fidelity by reducing false positives, increasing signal coverage, and building feedback loops that keep thresholds accurate as the fleet grows
ML/RL Integration
- Evaluate and integrate machine learning and reinforcement learning techniques where they outperform rule-based approaches — and know when not to reach for a model
Product Engineering
- Ship customer-facing features end-to-end across the CCIA stack — alert rule engine, control plane APIs, automated action systems, and insights delivery surfaces
0-to-1 & Scale
- Build new systems from scratch and scale existing ones to support Crusoe's rapidly growing global fleet
Cross-Functional Collaboration
- Work closely with product counterparts to shape requirements early and partner with the data science team to develop and validate detection models
System Design
- Participate in design discussions across teams, contribute architectural perspective, and help evaluate technical trade-offs
Technical Mentorship
- Mentor engineers at all levels through code review, design feedback, and direct coaching, and contribute to hiring by helping define what great looks like
What You'll Bring to the Team
Anomaly Detection & Heuristics Expertise
- Deep experience building anomaly detection systems, heuristics-based rule engines, or ML/RL systems for infrastructure or data-intensive domains
Threshold & Signal Calibration
- Demonstrated ability to reason about precision/recall trade-offs and build feedback loops that keep detection systems accurate over time
Distributed Systems Fundamentals
- Strong foundations in the building blocks of reliable, scalable backend systems
Full Software Engineering Craft
- 5+ years shipping production software; experience with modern compiled or systems languages (Go, Rust, C++, Java, or similar)
Data & Observability Fluency
- Comfortable with time-series data, telemetry pipelines, and observability primitives
Communication
- Ability to explain detection logic, trade-offs, and system behavior clearly to both engineers and non-technical partners
Force Multiplier Mindset
- Make the team better through mentorship, clear technical vision, and genuine investment in the people around you
Bonus Points
- Experience with GPU profiling tools (Nsight, NCCL Inspector) or hardware-level infrastructure diagnostics
- Background in observability platforms or products
- Experience with reinforcement learning applied to operational or infrastructure problems
- Familiarity with large-scale fleet management or cloud infrastructure
- Passion for building team culture and engineering quality of life
Benefits
- Competitive compensation and equity packages
- Restricted Stock Units
- Paid time off, paid holidays & leave of absence programs
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off
- Global travel insurance & emergency assistance
- Daily meals allowance
Member of Technical Staff
Write and maintain production backend services, build scalable frameworks, and deploy infrastructure using Java, Python, React, Docker, and Kubernetes. Requires 1 year of experience with LLMs, recommendation systems, and probabilistic modeling.
Principal Software Engineer, Money Group
Technical lead for Gusto's Money Group, guiding architecture and service migrations for financial infrastructure including payments, lending, and accounts. Requires 10+ years experience building regulated financial systems at scale.
Staff Software Engineer, Spend Management
Staff Software Engineer leading architecture and development of a self-service spend management platform for healthcare practices. Requires 2+ years .NET and AWS experience plus strong backend and frontend skills.