Skip to content

Software Engineer, Kernel Performance & AI Tooling

266k – 445kSan Francisco, CAHybrid
Summary

Develops kernel performance optimizations, AI-assisted tooling, and observability infrastructure for AI-native hardware. Requires strong low-level systems experience, kernel/accelerator expertise, and familiarity with AI workflows for engineering acceleration.

About the role

Responsibilities

  • Build developer tooling and workflows that make kernel development and performance optimization faster, more scalable, and easier to debug, integrate, and deploy.
  • Develop observability, diagnostics, and validation infrastructure that makes AI-assisted optimization systems more interpretable, reliable, and effective.
  • Optimize production kernels end to end by formulating optimization problems, running search loops, analyzing bottlenecks, debugging generated implementations, and landing improvements into production.
  • Design abstractions, interfaces, and automation systems that accelerate kernel optimization, correctness validation, and hardware-software co-design.
  • Improve AI-assisted optimization systems for specialized tasks through better datasets, evaluations, benchmarking, and research infrastructure.
  • Partner across research and engineering teams to turn new ideas into practical systems spanning production needs and long-term infrastructure strategy.

Requirements

  • Strong systems or tooling engineering experience, with a background in low-level software, performance optimization, or infrastructure.
  • Experience with developer tooling, debugging infrastructure, profiling, observability, or workflow design for technical users.
  • Depth in kernel development, accelerator architecture, compiler systems, or related performance-critical domains.
  • Familiarity with AI-assisted systems, agentic workflows, post-training, or reinforcement learning for engineering or research applications.
  • Strong experimental judgment, comfort with ambiguity, and the ability to move fluidly between research exploration and production execution.
  • Interest in compilers, DSLs, program synthesis, or AI for systems.

Preferred

  • Strong systems and tooling engineer with real depth in kernels and accelerators.
  • Comfortable working across software and hardware boundaries, can reason deeply about performance, abstractions, and system design.
  • Hands-on experience optimizing code for GPUs, high-performance CPUs, or custom accelerators.
  • View AI not as the end product, but as a force multiplier for engineering productivity and system optimization.
Skills
Kernel DevelopmentPerformance OptimizationGPUsCPUsCompilersDeveloper ToolingObservabilityProfilingAI-assisted SystemsReinforcement LearningDSLsProgram SynthesisAccelerator ArchitectureHardware-Software Co-designDebugging Infrastructure
Similar roles at this salary range
All DevOps / SRE jobs →
Onebrief

Principal Infrastructure Engineer

Principal Infrastructure Engineer building and operating secure cloud-native and edge platforms for military collaboration software. Requires 8+ years production infrastructure experience, deep Kubernetes expertise, and ability to obtain SECRET clearance.

235k – 275kUnited StatesDevOps / SRERemoteGoAWS
Sentry

Staff Software Engineer, AI Developer Tooling

Own AI-assisted coding tooling at Sentry. Build harnesses, context systems, and API integrations so AI agents can operate across the full software development lifecycle.

240k – 320kSan Francisco, CADevOps / SREHybridCI/CDPython
Together AI

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

Design and operate multi-petabyte distributed storage systems for large-scale AI training and inference, integrating parallel filesystems and building Kubernetes-native storage platforms.

250k – 300kSan Francisco, CADevOps / SREOn-siteGoCeph
Forge

Director of Platform & Reliability Engineering

The Director of Platform & Reliability Engineering will lead an engineering organization responsible for secure, scalable, and highly reliable products. This role involves setting the vision for internal platforms, cloud infrastructure, developer enablement, and production operations.

235k – 245kSan Francisco, CADevOps / SREHybridCI/CDKubernetes
Zoox

Staff Site Reliability Engineer

Zoox is seeking a Staff Site Reliability Engineer to lead source control, owning the technical strategy and roadmap for their Git-based monorepo. This role involves migrating from GitHub Enterprise to GitHub Cloud, building developer tooling, and partnering with various teams to enhance source control as a strategic asset.

250k – 300kFoster City, CADevOps / SREHybridBuckCI/CD