Staff Technical Lead for Inference & ML Performance

Leads team to build and optimize high-performance ML inference systems for generative models. Drives hands-on optimizations across the performance stack, collaborates with research teams, and mentors engineers to exceed industry benchmarks.

San Francisco, CAML EngineeringOnsite

Apply

About the role

Responsibilities

Set technical direction for team working on kernels, applied performance, ML compilers, and distributed inference to build high-performance inference solutions.
Provide hands-on IC leadership by contributing to critical inference performance enhancements and optimizations.
Collaborate with research and applied ML teams to influence model inference strategies and deployment techniques.
Drive advanced performance optimizations including model parallelism, kernel optimization, and compiler strategies.
Mentor and scale team of performance-focused engineers.

Requirements

Deep experience in ML performance optimization for large-scale generative models in production.
Expertise in full ML performance stack: PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels.
Expert knowledge of inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving, profiling.
Lead from the front as a respected IC who enjoys hands-on problem-solving.
Thrive in cross-functional collaboration with ML teams, researchers, and stakeholders.

Nice-to-haves

Experience building inference engines for diffusion and generative media models.
Track record of industry-leading performance improvements (papers, open-source, benchmarks).
Leadership experience in scaling technical teams.

Skills

PyTorchTensorRTTransformerengineTritonCutlassQuantizationModel ParallelismKernel AuthoringMl CompilersDistributed Serving

Similar roles

ML Engineering jobs

Traba

Staff Software Engineer

Build and lead Traba's agentic platform as a founding member of the Agents team. Architect orchestration, evals, model strategy, and integrations for autonomous AI agents in industrial supply chain workflows. Requires 7+ years engineering experience including 2+ years production LLM/agent systems, plus customer immersion and 0-to-1 leadership.

240k – 300kNew York, NY +1ML EngineeringHybrid7+ YOELLMsAPIs

Traba

Staff Software Engineer

Build and scale Traba's applied AI platform by integrating frontier models and agents into production systems for automating industrial staffing pipelines. Requires 7+ years deep full-stack experience with TypeScript/Node.js or Python, PostgreSQL, messaging systems, and distributed systems; AI/LLM production experience a plus.

240k – 300kNew York, NY +1ML EngineeringOn-site7+ YOEAIAPIs

Otter

Staff Machine Learning Engineer

Lead architect and builder of large-scale ASR/NLP/LLM systems for Otter's conversational intelligence products. Owns end-to-end ML lifecycles from research to production deployment, mentoring engineers and setting technical direction.

278k – 330kMountain View, CAML EngineeringHybrid10+ YOEJAXAsr

Grafana Labs

Staff AI Engineer

Staff-level AI engineer building and shipping LLM/agent-powered observability features that help users detect, triage, and resolve incidents. Requires strong production software engineering experience plus practical GenAI/LLM skills.

175k – 220kUnited StatesML EngineeringRemote7+ YOEAWSGCP

Perplexity

Member of Technical Staff

Build AI agents that navigate digital environments and perform user tasks. Requires strong AI/ML experience, Python proficiency, and product intuition.

220k – 405kSan Francisco, CAML EngineeringOn-site5+ YOEGoCdp