Skip to content

Member of Technical Staff - ML Training Systems

Build and optimize ML training systems for production-scale language models using PyTorch and frameworks like Hugging Face. Requires 5+ years experience in high-performance code and training optimizations; onsite in NYC or SF.

150k – 350kNew York, NYSan Francisco, CAML EngineeringOnsite5+ YOE

About the role

Requirements

  • 5+ years of experience writing high-quality, high-performance code.
  • Experience working with torch and high-level training frameworks (Huggingface, verl, slime)
  • Experience with ML training optimization (tell us a story about eliminating data loading bottlenecks, overlapping communications with compute, rewriting a trainer to handle off-policy rollouts, etc.)
  • Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc.)
  • Ability to work in-person, in our NYC or San Francisco office.

Skills

PyTorchHugging FaceMl Training OptimizationLinux KernelFile SystemsContainers

Similar roles

ML Engineering jobs

Member of Technical Staff, Model Training

Own the training pipeline for search and agent models, building from product usage data through fine-tuning and evaluation to production deployment. Requires deep expertise in transformer fine-tuning, data curation, and training models for ranking, retrieval, and agent behavior.

150k – 300kCaliforniaML EngineeringOn-siteData CurationLabel Quality

Member of Technical Staff, Search Ranking

Own the multi-stage ranking pipeline for web-scale search, balancing precision, recall, latency, and compute cost across retrieval, first-pass ranking, and neural reranking.

150k – 300kUnited StatesML EngineeringOn-site7+ YOERankingRetrieval

Staff Software Engineer, Engineering AI Team

Staff engineer builds AI-driven platform infrastructure for SDLC transformation, owns end-to-end experiments using AI agents like Claude, and ensures high-velocity code delivery with strong abstractions and real-world grounding. Requires staff-level architecture experience and AI-native workflows.

150k – 180kUnited StatesML EngineeringRemoteCI/CDClaude

Member of Technical Staff - Voice Model

Develop voice AI models for natural, low-latency spoken interactions on the Grok team. Handle data pipelines, model training with JAX/PyTorch, evaluations, and product integrations. Requires Python expertise, large-scale data processing, and distributed systems experience.

150k – 450kPalo Alto, CAML EngineeringOn-siteRayJAX

Member of Technical Staff - ML Performance

Engineers optimize ML systems for performance at scale, focusing on GPU utilization, inference engines, and container runtime to boost throughput and reduce latency for language and diffusion models. Requires 5+ years experience with PyTorch, CUDA, and performance debugging.

150k – 350kNew York, NY +1ML EngineeringOn-site5+ YOEvLLMCUDA