Skip to content

Staff Software Engineer, ML Performance & Systems

Designs and implements novel model serving architectures on in-house inference engine to maximize throughput and minimize latency for generative media models. Develops performance tools and collaborates with ML teams on Nvidia-based systems optimizations.

180k – 250kSan Francisco, CAML EngineeringOnsite

About the role

Key Responsibilities

  • Help fal maintain its frontier position on model performance for generative media models.
  • Design and implement novel approaches to model serving architecture on top of our in-house inference engine, focusing on maximizing throughput while minimizing latency and resource usage.
  • Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities.
  • Work closely with our Applied ML team and customers (frontier labs on the media space) and make sure their workloads benefit from our accelerator.

Requirements

  • Strong foundation in systems programming with expertise in identifying and fixing bottlenecks.
  • Deep understanding of cutting edge ML infrastructure stack (PyTorch, TensorRT, TransformerEngine, Nsight), including model compilation, quantization, and serving architectures.
  • Fundamental view of underlying hardware (Nvidia based systems), including custom GEMM kernels with CUTLASS.
  • Proficient in Triton or comparable experience in lower-level accelerator programming.
  • Experience with multi-dimensional model parallelism (TP with context/sequence parallel).
  • Familiar with internals of Ring Attention, FA3, FusedMLP implementations.

Compensation

$180,000 - $250,000 + equity + comprehensive benefits package

Skills

PyTorchTensorRTTransformerengineNsightTritonCutlassNvidia HardwareModel CompilationQuantizationModel ServingRing AttentionFa3Fusedmlp

Similar roles

ML Engineering jobs

Member of Technical Staff - X Search

Develops and operates large-scale search engine infrastructure, including retrieval algorithms, indexing, and ML ranking models integrated with Grok AI. Requires experience with search systems, vector databases, and production ML in Python, Go, or Rust.

180k – 440kPalo Alto, CAML EngineeringOn-siteGoRust

Member of Technical Staff - Post-Training and RL

Develops advanced post-training and reinforcement learning techniques like RLHF/DPO and reward modeling to enhance AI model reasoning, truthfulness, and real-world capabilities at xAI. Seeks passionate AI enthusiasts obsessed with truth-seeking models; prior experience preferred but not required.

180k – 600kPalo Alto, CAML EngineeringOn-siteDpoJAX

Member of Technical Staff - Multimodal Understanding

Develops large-scale distributed systems and pipelines for multimodal AI pre-training, post-training, and inference across image, video, audio, and text. Requires expert Python proficiency, experience with JAX/PyTorch/XLA, and scaling multimodal ML systems.

180k – 440kPalo Alto, CAML EngineeringOn-siteRlJAX

Staff Data Scientist | Modeling

Staff Data Scientist advances ML models for healthcare claims auditing, curating data, developing precise models, and optimizing business impact for health plans. Requires expertise in SQL, Python/R, and building ML models from scratch.

180k – 260kUnited StatesML EngineeringRemoteRSQL

Staff Software Engineer | GenAI & Agentic Workflows

Leads design and development of large-scale AI systems for document processing and agentic workflows in healthcare payment integrity. Requires 6+ years experience with Java/Python, productionizing LLMs/RAG, and distributed systems.

180k – 250kUnited StatesML EngineeringRemote6+ YOERAGRay