# Member of Technical Staff - ML Performance
**Company:** [Modal](https://hotfix.jobs/companies/modal)
**Location:** New York, NY, San Francisco, CA
**Salary:** $150K-$350K
**Experience:** 5+ years
**Skills:** PyTorch, vLLM, TensorRT, CUDA, Nvidia Gpu, Ml Frameworks, Linux Kernel, Containers
**Posted:** 2024-12-18
> Engineers optimize ML systems for performance at scale, focusing on GPU utilization, inference engines, and container runtime to boost throughput and reduce latency for language and diffusion models. Requires 5+ years experience with PyTorch, CUDA, and performance debugging.
## Job Description
## Requirements
- 5+ years of experience writing high-quality, high-performance code.
- Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).
- Familiarity with Nvidia GPU architecture and CUDA.
- Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).

## Nice-to-have
- Familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).
**Apply:** https://hotfix.jobs/jobs/member-of-technical-staff-ml-performance-at-modal-41e91085-399c-495f-ae55-52c1de8a3c1a
**Canonical:** https://hotfix.jobs/jobs/member-of-technical-staff-ml-performance-at-modal-41e91085-399c-495f-ae55-52c1de8a3c1a