Research Engineer, Multimodal

225k – 400kRedwood City, CAML EngineeringOnsiteMar 6

Summary

Research Engineer advancing video/image generation models for AI characters, leading fine-tuning, novel architectures, data pipelines, and optimizations using PyTorch and multimodal techniques. Requires expertise in generation models and distributed training.

About the role

What You'll Do

Lead fine-tuning and continued training of video generation models, including image-to-video and joint audio-visual generation.
Design and experiment with novel model architectures for multimodal generation, including multimodal conditioning (voice, structured text, reference images).
Leverage techniques such as LoRA, RLHF, and full-parameter fine-tuning to improve model quality.
Design and build large-scale data pipelines and automated annotation workflows.
Explore model compression, inference acceleration, and serving optimizations for real-time video processing.

Who You Are

Strong passion for visual AI with hands-on problem-solving.
Proficient in PyTorch with end-to-end experience in data processing, model training, and deployment.
Solid understanding of video/image generation architectures (diffusion models, DiT, ControlNet, SOTA video models).
Experience with multimodal model training (audio, vision, language).
Experience with distributed training tools (FSDP, DeepSpeed).
Experience with large-scale data processing and dataset construction.

Nice to Have

Experience with joint audio-visual or speech-conditioned generation.
Experience with AIGC, video effects, character animation, or asset generation.
Familiarity with ML deployment (Kubernetes, Slurm, Docker, cloud platforms).
Publications in NeurIPS, ICLR, CVPR, ECCV, ICCV.

Skills

PyTorchdiffusion modelsDiTControlNetLoRARLHFFSDPDeepSpeedKubernetesDocker

Similar roles at this salary range

All ML Engineering jobs →

Airbnb

Jun 23

Staff Machine Learning Engineer

Build and deploy cutting-edge ML and Generative AI systems to transform Airbnb's customer support experience, focusing on LLM fine-tuning, RAG, and intelligent service automation.

212k – 260kSan Francisco, CAML EngineeringRemote9+ YOELLMRAG

Mixpanel

Jun 23

Senior Software Engineer, AI Platform

Senior Software Engineer building scalable AI infrastructure, agent orchestration frameworks, evaluation systems, and high-performance LLM serving at Mixpanel. Requires 5+ years experience and hands-on LLM/agent work.

226k – 306kSan Francisco, CAML EngineeringHybrid5+ YOELLMsMLOps

Twilio

Jun 23

Tech Lead, Applied Research

Tech Lead driving AI R&D and end-to-end delivery of production-ready prototypes using full-stack development, LLMs, and emerging technologies. Requires 10+ years experience and strong autonomy.

228k – 335kUnited StatesML EngineeringRemote10+ YOEGoSQL

Jun 22

Senior Machine Learning Systems Engineer

Build large-scale ML experimentation and training orchestration platforms, including agentic AI execution systems, to accelerate Ads ML development at Reddit. Requires 5+ years infrastructure experience and 2+ years building production ML platforms.

217k – 303kUnited StatesML EngineeringRemote5+ YOERayArgo

Axion

Jun 22

Staff Software Engineer, Agentic Platform

Senior individual contributor architecting and scaling agentic LLM systems that turn messy manufacturing data into reliable root-cause insights. Owns orchestration, retrieval, evaluation, and guardrails for non-deterministic production systems.

250k – 270kSan Francisco, CA +1ML EngineeringHybrid7+ YOEMCPobservability

Apply