Skip to content

Staff Cloud Hypervisor R&D

Leads R&D for next-generation hypervisor optimized for AI workloads with massive GPU fleets, focusing on zero-latency hardware virtualization, performance profiling, and security isolation. Requires 7+ years in hypervisors, kernel development, and AI hardware expertise.

204k – 247kSan Francisco, CASunnyvale, CADevOps / SREOnsite7+ YOE

About the role

What You’ll Be Working On

  • Next-Gen Hypervisor Architecture: Lead the R&D and implementation of core hypervisor components (KVM, QEMU, or custom Rust-based solutions) specifically optimized for massive-scale GPU fleets.
  • AI Hardware Virtualization: Develop and refine advanced hardware pass-through and abstraction techniques (SR-IOV, VFIO, mdev) to ensure NVIDIA GPUs and BlueField DPUs operate with near-zero latency in a multi-tenant environment.
  • The "Holy Grail" Challenges: Solve high-stakes technical hurdles such as live migration for AI workloads with 80GB+ VRAM and optimizing PCIe peer-to-peer communication between virtualized accelerators.
  • Performance Research & Profiling: Conduct deep-dive bottleneck analysis across the entire stack—from CPU microarchitecture and MMU virtualization to guest OS scheduling—to minimize jitter and maximize throughput.
  • Open Source Leadership: Actively contribute to and maintain upstream open-source virtualization projects, positioning Crusoe as a thought leader in the Linux kernel and virtualization communities.
  • Security & Isolation: Architect robust security boundaries for AI-native cloud infrastructure, balancing high-performance hardware access with strict multi-tenant isolation and hardening.

What You’ll Bring to the Team

  • 7+ Years of Experience: Proven track record in hypervisor internals, kernel development, or low-level systems programming.
  • Deep Virtualization Expertise: Expert-level knowledge of CPU virtualization (Intel VT-x, AMD-V) and memory virtualization (EPT/NPT, HugePages). Comfortable discussing the nuances of VMExit overhead.
  • Hardware-Software Integration: Experience working with specialized AI hardware, including GPUs, InfiniBand/RoCE NICs, and SmartNICs/DPUs.
  • Programming & Tooling: Mastery of C and C++ is required; proficiency in Rust for modern systems programming is highly preferred. Experience with QEMU, KVM, and Linux kernel debugging tools (perf, ftrace, eBPF).
  • I/O Mastery: Deep understanding of VirtIO, vhost-user, and hardware-accelerated I/O paths.
  • Technical Leadership: Experience leading complex, cross-functional projects that bridge the gap between hardware engineering and cloud control planes.

Compensation

Compensation will be paid in the range of $204,000 - $247,000. Restricted Stock Units are included in all offers.

Skills

KvmQemuRustSr-IovVfioNvidia GpusBluefield DpusLinux KernelCC++VirtioVhost-UserEbpfPerf (Linux Profiler)

Similar roles

DevOps / SRE jobs

Staff Software Engineer, Build

Leads development and optimization of large-scale build infrastructure using Bazel, focusing on performance, reliability, and CI efficiency for Airbnb's engineering teams. Requires 8+ years experience, deep Bazel expertise, and cross-team leadership skills.

204k – 255kUnited StatesDevOps / SRERemote8+ YOECiBazel

Staff Software Engineer, Network Infrastructure

Builds and operates cloud-native network infrastructure including service mesh, cross-region gateways, and edge security for Airbnb's global services. Requires 9+ years experience in distributed systems and public cloud networking (AWS/GCP/Azure).

204k – 255kUnited StatesDevOps / SRERemote9+ YOEAWSGCP

Staff Software Engineer (Developer Platform)

Staff Software Engineer on the Developer Platform team building internal infrastructure, monorepo CI/CD, and Agentic AI tooling to accelerate secure software delivery. Requires 8+ years experience with Go, CI/CD, and cloud platform infrastructure.

205k – 231kUnited StatesDevOps / SRERemote8+ YOEGoRAG

Member of Technical Staff, Infrastructure

Architects and maintains scalable cloud infrastructure using Kubernetes and AWS/Terraform, manages CI/CD pipelines, responds to on-call incidents, and collaborates on best practices. Requires 5+ years cloud experience, 3+ years IaC, and strong programming skills.

205k – 225kSan Francisco, CADevOps / SREOn-site5+ YOEGoAWS

Staff Software Engineer (.NET)

Staff Software Engineer builds and owns secure, reproducible .NET build pipelines, NuGet packaging automation, and developer tools for Chainguard Libraries. Requires 8+ years .NET ecosystem experience, Go proficiency, and DevOps/SRE background in cloud-native environments.

205k – 231kUnited StatesDevOps / SRERemote8+ YOEC#Go