Staff ML Research Scientist

Leads end-to-end applied ML research in NLP, LLMs, retrieval, and multimodal models for healthcare AI, driving from experimentation to production deployment with rigorous evaluation and clinician collaboration. Requires 7+ years experience, MS/PhD, and depth in ML areas like PyTorch tooling.

190k – 260kSan Francisco, CAAI ResearchOnsite7+ YOE

Apply

About the role

What You'll Do

Own end-to-end applied research: frame the problem, design experiments, ship to production, and monitor impact against real-world metrics.
Set technical direction across LLMs, retrieval, and multimodal; run ablations/error analysis that change product decisions.
Build evaluation that matters: link offline metrics to online outcomes; define thresholds, monitoring, and rollback.
Partner to deliver with engineering and product—and, when relevant, clinicians/domain experts—to align data, success criteria, and timelines.
Raise the bar by mentoring peers and codifying standards for reliability, safety, and documentation.
Improve the platform (data, training, serving, observability) to speed iteration and ensure reproducibility.
Explore new directions, with computer vision/vision-language work as a nice-to-have for future strategic initiatives.

What We're Looking For

MS or PhD (or equivalent research experience) in Computer Science, Electrical Engineering, Computational Linguistics, Biomedical Informatics, or related quantitative field.
7+ years of applied ML research experience (or PhD + 5 years, or equivalent evidence of Staff-level impact).
Depth in one or more areas: LLMs and NLP, computer vision, speech, recommendation/ranking, retrieval, or multimodal modeling.
Strong experimental rigor: clear hypothesis framing, offline→online linkage, calibration and stratified analyses, ablations that influence decisions.
Proven ability to take models to production.
Hands-on with modern tooling: PyTorch and common experiment/ops tools (for example MLflow, Databricks, Ray, or similar).
System thinking: can choose methods based on constraints, design for observability and rollback, and document decisions clearly.
Collaborative communicator who writes crisp design docs and explains complex ideas to non-specialists; comfortable mentoring peers.

Preferred Qualifications

Health data familiarity, including EHR or imaging.
Experience in one or more areas: clinical NLP or LLMs, computer vision, speech, retrieval or multimodal modeling.
Shipped, measured models in production with monitoring and clear rollback; external or multi-site validation is a plus.
Workflow integration with EHR, RIS, PACS, or reporting systems; PowerScribe or Dragon exposure helpful.
Strong evaluation practices: calibration, slice analysis, and ablations.
Safety and governance in sensitive domains, including PHI handling and HIPAA or FDA-adjacent environments.
Technical mentorship and contributions to team research culture; publications or impactful open-source work.
Practical tooling: PyTorch plus modern ML ops tools such as MLflow, Databricks, Ray, or Triton.

Skills

PyTorchLLMsNLPComputer VisionRetrievalMultimodal ModelingMLflowDatabricksRayTriton

Similar roles

AI Research jobs

Databricks

Staff Research Engineer, Data Agents

Develop post-training recipes and build enterprise data agents for autonomous planning, code generation, and multi-step workflows. Requires 2+ years applied research experience shipping prototypes, plus expertise in LLMs, agents, and RL.

190k – 270kSan Francisco, CAAI ResearchOn-site2+ YOELLMsPython

Nuro

Senior/Staff Machine Learning Research Scientist: Generative Modeling for Planning

Develop and scale generative models like diffusion and flow-matching for autonomous driving plan generation. Collaborate across teams to productize models for real-world deployment, requiring PhD/MSc + 3+ years in generative modeling and strong Python/C++ skills.

194k – 352kMountain View, CAAI ResearchOn-siteC++LLMs

Webflow

Senior Staff Machine Learning Scientist, Assets

Leads research in computer vision, multimodal understanding, and visual generation. Develops novel models and methodologies, translates research to production, and mentors teams. Requires PhD preferred, 8+ years experience, and expertise in PyTorch, TensorFlow, transformers.

194k – 285kUnited StatesAI ResearchRemote8+ YOEPythonPyTorch

Polymath

Member of Technical Staff - Research

Conducts applied research on long-horizon autonomous AI agents, focusing on evaluation, post-training, environment design, and benchmarks to improve frontier models. Builds simulations, runs experiments, ships production code, and publishes findings.

200k – 350kSan Francisco, CAAI ResearchOn-sitePythonAI Agents

Machinify

Staff NLP Scientist

Develops state-of-the-art VQA systems for medical records using advanced NLP and CV techniques to achieve expert-level accuracy in document analysis. Requires deep expertise in NLP/CV, strong software engineering, and ability to work with noisy data in healthcare domain.

180k – 280kUnited StatesAI ResearchRemoteNLPPython