# Applied Research Scientist - Foundation Models
**Company:** [Ambient.ai](https://hotfix.jobs/companies/ambient-ai)
**Location:** Redwood City, CA
**Salary:** $140K-$175K
**Skills:** PyTorch, TensorFlow, Python, C++, Vision Transformers, Transformers, Cnns, Vision-Language Models, Model Distillation, Quantization, Model Pruning, Multimodal Ai
**Posted:** 2025-10-22
> Develops and optimizes transformer-based vision-language foundation models for physical security, owning full-cycle training, fine-tuning, compression, and deployment for real-time inference on images, videos, and text. Requires PhD/Master's in CS/EE, hands-on ML expertise with PyTorch/TensorFlow, Transformers, and ViTs.
## Job Description
## What you'll do

- **Develop & Optimize VLMs**: Design and optimize transformer-based vision-language models to understand images, videos, and text, and optimize for real-time inference.
- **Pre-training & Fine-tuning**: Own the full training pipeline—from pre-training on image-text data to fine-tuning for Ambient.ai’s physical security domain and use cases.
- **Model Compression & Optimization**: Apply techniques like distillation, quantization, and pruning to reduce model size and latency, enabling efficient edge deployment.
- **Leverage Open-Source & Innovate**: Use and extend state-of-the-art open-source models. Prototype new architectures and training methods to advance Ambient.ai’s multimodal AI research.
- **Cross-Team Collaboration**: Work with engineering and product teams to integrate models into the platform. Iterate based on real-world feedback and deployment data to improve performance.
- **Research and Experimentation**: Stay current with vision, NLP, and multimodal AI research. Design experiments to test new algorithms and continually enhance our core AI systems.

## What you'll bring

- Ph.D. or Master’s in CS, EE, or related field, with a strong foundation in AI/ML (Ph.D. preferred or Master’s with strong experience)
- Proficient in Python/C++ and deep learning frameworks like PyTorch or TensorFlow. Comfortable with large-scale training pipelines
- Hands-on experience with CNNs, Transformers, and Vision Transformers (ViT). Strong understanding of vision-language models and how to fine-tune or adapt them
- Proven skills in model training and optimization, including fine-tuning on large datasets and applying distillation, quantization, or similar techniques. Experience with foundation or multimodal models is a plus.
- Strong problem-solving ability: quick prototyping, diagnosing failure cases, and iterating on solutions
- Startup experience preferred: Comfortable with ambiguity, fast iteration, and owning projects end-to-end
**Apply:** https://hotfix.jobs/jobs/applied-research-scientist-foundation-models-at-ambient-ai-a47b6aa1-32f3-4cf3-ae30-44d21ca164c7
**Canonical:** https://hotfix.jobs/jobs/applied-research-scientist-foundation-models-at-ambient-ai-a47b6aa1-32f3-4cf3-ae30-44d21ca164c7