Staff Software Engineer, Distributed Systems
Owns core distributed systems infrastructure including database architecture, caching, rate limiting, and observability for real-time clinical workflows at scale. Requires 7+ years experience with strong backend skills in Python/Go/TypeScript and Postgres expertise. Hybrid in SF office.
What You’ll Own
- Database Architecture & Scaling — Own database performance end-to-end: query optimization, indexing strategy, connection management, capacity planning. Design multi-tenant data patterns that maintain performance while balancing isolation tradeoffs.
- Caching & Latency Optimization — Build caching infrastructure that keeps EHR API latency out of critical user paths. Identify and optimize hot paths across the application. Build instrumentation to catch performance regressions before they reach users.
- Rate Limiting — Design systems that respect EHR API rate limits while maintaining user experience. Build infrastructure that degrades gracefully under load: queue management, circuit breakers, load shedding.
- Reliability & Observability — Dashboards and alerting for database performance, cache hit rates, connection pool utilization, API latency by customer. Systematically identify and harden against failure modes: connection exhaustion, cache stampedes, thundering herds.
Who You Are
- 7+ years in software engineering, 3+ focused on infrastructure, backend systems, or platform engineering
- Staff-level scope: owned cross-cutting infrastructure, debugged production issues that stumped others
- Strong backend fundamentals in Python, Go, TypeScript, or similar
- Deep experience with relational databases (Postgres preferred)
- Comfort reading code across the stack to trace performance issues
- Track record of diagnosing and solving scaling or reliability problems
- In SF, 3x/week in-person
Compensation
Base compensation range of approximately $250,000–$300,000 per year, exclusive of equity.
Staff Software Engineer, Backend (Infrastructure)
Lead architecture and development of scalable backend infrastructure supporting AI products. Requires 10+ years building large-scale web services with Python, Django, and distributed systems expertise.
Staff Engineer, Command Center Insights & Actions
Staff Engineer owning detection systems for Crusoe's Command Center platform. Defines heuristics, thresholds, and anomaly detection rules that translate infrastructure telemetry into actionable signals. Ships production features in Go/Rust/C++/Java with 5+ years experience.
Staff Software Engineer (Technical Lead), Storage
Lead engineering efforts on Airbnb's critical KV stores, caching, and data ingestion platforms. Architect and scale high-performance distributed storage systems while mentoring teams and influencing long-term technical strategy.