# Staff Technical Lead for Inference & ML Performance
**Company:** [Fal](https://hotfix.jobs/companies/fal)
**Location:** San Francisco, CA
**Skills:** PyTorch, TensorRT, Transformerengine, Triton, Cutlass, Quantization, Model Parallelism, Kernel Authoring, Ml Compilers, Distributed Serving
**Posted:** 2025-10-29
> Leads team to build and optimize high-performance ML inference systems for generative models. Drives hands-on optimizations across the performance stack, collaborates with research teams, and mentors engineers to exceed industry benchmarks.
## Job Description
## Responsibilities
- Set technical direction for team working on kernels, applied performance, ML compilers, and distributed inference to build high-performance inference solutions.
- Provide hands-on IC leadership by contributing to critical inference performance enhancements and optimizations.
- Collaborate with research and applied ML teams to influence model inference strategies and deployment techniques.
- Drive advanced performance optimizations including model parallelism, kernel optimization, and compiler strategies.
- Mentor and scale team of performance-focused engineers.

## Requirements
- Deep experience in ML performance optimization for large-scale generative models in production.
- Expertise in full ML performance stack: PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels.
- Expert knowledge of inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving, profiling.
- Lead from the front as a respected IC who enjoys hands-on problem-solving.
- Thrive in cross-functional collaboration with ML teams, researchers, and stakeholders.

## Nice-to-haves
- Experience building inference engines for diffusion and generative media models.
- Track record of industry-leading performance improvements (papers, open-source, benchmarks).
- Leadership experience in scaling technical teams.
**Apply:** https://hotfix.jobs/jobs/staff-technical-lead-for-inference-ml-performance-at-fal-ba7b3f50-5bdb-40e7-abe5-2a8c29caf740
**Canonical:** https://hotfix.jobs/jobs/staff-technical-lead-for-inference-ml-performance-at-fal-ba7b3f50-5bdb-40e7-abe5-2a8c29caf740