Skip to content

Staff Technical Lead for Inference & ML Performance

Leads team to build and optimize high-performance ML inference systems for generative models. Drives hands-on optimizations across the performance stack, collaborates with research teams, and mentors engineers to exceed industry benchmarks.

San Francisco, CAML EngineeringOnsite

About the role

Responsibilities

  • Set technical direction for team working on kernels, applied performance, ML compilers, and distributed inference to build high-performance inference solutions.
  • Provide hands-on IC leadership by contributing to critical inference performance enhancements and optimizations.
  • Collaborate with research and applied ML teams to influence model inference strategies and deployment techniques.
  • Drive advanced performance optimizations including model parallelism, kernel optimization, and compiler strategies.
  • Mentor and scale team of performance-focused engineers.

Requirements

  • Deep experience in ML performance optimization for large-scale generative models in production.
  • Expertise in full ML performance stack: PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels.
  • Expert knowledge of inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving, profiling.
  • Lead from the front as a respected IC who enjoys hands-on problem-solving.
  • Thrive in cross-functional collaboration with ML teams, researchers, and stakeholders.

Nice-to-haves

  • Experience building inference engines for diffusion and generative media models.
  • Track record of industry-leading performance improvements (papers, open-source, benchmarks).
  • Leadership experience in scaling technical teams.

Skills

PyTorchTensorRTTransformerengineTritonCutlassQuantizationModel ParallelismKernel AuthoringMl CompilersDistributed Serving

Similar roles

ML Engineering jobs

Staff Software Engineer

Build and lead Traba's agentic platform as a founding member of the Agents team. Architect orchestration, evals, model strategy, and integrations for autonomous AI agents in industrial supply chain workflows. Requires 7+ years engineering experience including 2+ years production LLM/agent systems, plus customer immersion and 0-to-1 leadership.

240k – 300kNew York, NY +1ML EngineeringHybrid7+ YOELLMsAPIs

Staff Software Engineer

Build and scale Traba's applied AI platform by integrating frontier models and agents into production systems for automating industrial staffing pipelines. Requires 7+ years deep full-stack experience with TypeScript/Node.js or Python, PostgreSQL, messaging systems, and distributed systems; AI/LLM production experience a plus.

240k – 300kNew York, NY +1ML EngineeringOn-site7+ YOEAIAPIs

Staff Machine Learning Engineer

Lead architect and builder of large-scale ASR/NLP/LLM systems for Otter's conversational intelligence products. Owns end-to-end ML lifecycles from research to production deployment, mentoring engineers and setting technical direction.

278k – 330kMountain View, CAML EngineeringHybrid10+ YOEJAXAsr

Staff AI Engineer

Staff-level AI engineer building and shipping LLM/agent-powered observability features that help users detect, triage, and resolve incidents. Requires strong production software engineering experience plus practical GenAI/LLM skills.

175k – 220kUnited StatesML EngineeringRemote7+ YOEAWSGCP

Member of Technical Staff

Build AI agents that navigate digital environments and perform user tasks. Requires strong AI/ML experience, Python proficiency, and product intuition.

220k – 405kSan Francisco, CAML EngineeringOn-site5+ YOEGoCdp