Skip to content

Research Engineer, Multimodal

225k – 400kRedwood City, CAML EngineeringOnsite
Summary

Research Engineer advancing video/image generation models for AI characters, leading fine-tuning, novel architectures, data pipelines, and optimizations using PyTorch and multimodal techniques. Requires expertise in generation models and distributed training.

About the role

What You'll Do

  • Lead fine-tuning and continued training of video generation models, including image-to-video and joint audio-visual generation.
  • Design and experiment with novel model architectures for multimodal generation, including multimodal conditioning (voice, structured text, reference images).
  • Leverage techniques such as LoRA, RLHF, and full-parameter fine-tuning to improve model quality.
  • Design and build large-scale data pipelines and automated annotation workflows.
  • Explore model compression, inference acceleration, and serving optimizations for real-time video processing.

Who You Are

  • Strong passion for visual AI with hands-on problem-solving.
  • Proficient in PyTorch with end-to-end experience in data processing, model training, and deployment.
  • Solid understanding of video/image generation architectures (diffusion models, DiT, ControlNet, SOTA video models).
  • Experience with multimodal model training (audio, vision, language).
  • Experience with distributed training tools (FSDP, DeepSpeed).
  • Experience with large-scale data processing and dataset construction.

Nice to Have

  • Experience with joint audio-visual or speech-conditioned generation.
  • Experience with AIGC, video effects, character animation, or asset generation.
  • Familiarity with ML deployment (Kubernetes, Slurm, Docker, cloud platforms).
  • Publications in NeurIPS, ICLR, CVPR, ECCV, ICCV.
Skills
PyTorchdiffusion modelsDiTControlNetLoRARLHFFSDPDeepSpeedKubernetesDocker
Similar roles at this salary range
All ML Engineering jobs →
Airbnb

Staff Machine Learning Engineer

Build and deploy cutting-edge ML and Generative AI systems to transform Airbnb's customer support experience, focusing on LLM fine-tuning, RAG, and intelligent service automation.

212k – 260kSan Francisco, CAML EngineeringRemote9+ YOELLMRAG
Mixpanel

Senior Software Engineer, AI Platform

Senior Software Engineer building scalable AI infrastructure, agent orchestration frameworks, evaluation systems, and high-performance LLM serving at Mixpanel. Requires 5+ years experience and hands-on LLM/agent work.

226k – 306kSan Francisco, CAML EngineeringHybrid5+ YOELLMsMLOps
Twilio

Tech Lead, Applied Research

Tech Lead driving AI R&D and end-to-end delivery of production-ready prototypes using full-stack development, LLMs, and emerging technologies. Requires 10+ years experience and strong autonomy.

228k – 335kUnited StatesML EngineeringRemote10+ YOEGoSQL
Reddit

Senior Machine Learning Systems Engineer

Build large-scale ML experimentation and training orchestration platforms, including agentic AI execution systems, to accelerate Ads ML development at Reddit. Requires 5+ years infrastructure experience and 2+ years building production ML platforms.

217k – 303kUnited StatesML EngineeringRemote5+ YOERayArgo
Axion

Staff Software Engineer, Agentic Platform

Senior individual contributor architecting and scaling agentic LLM systems that turn messy manufacturing data into reliable root-cause insights. Owns orchestration, retrieval, evaluation, and guardrails for non-deterministic production systems.

250k – 270kSan Francisco, CA +1ML EngineeringHybrid7+ YOEMCPobservability