Skip to content

Senior Applied Research Engineer

167k – 226kSan Francisco, CAML EngineeringHybrid5+ YOE
Summary

Senior Applied Research Engineer driving AI system quality through experimentation and evaluation of RAG, retrieval, and reasoning systems. Requires 5+ years applied ML/NLP experience with strong Python and evaluation methodology skills.

About the role

Responsibilities

  • Design and evaluate information access + reasoning strategies across RAG, agents, and classic ML: chunking, embedding models, hybrid search, metadata filtering, semantic routing
  • Prototype GenAI workflows (including agentic systems) that map and reason over compliance objects (controls ↔ risks ↔ requirements ↔ evidence)
  • Explore ML + probabilistic approaches where GenAI is not the best fit: classifiers, ranking models, graph/link prediction, calibration, and structured prediction
  • Build and maintain evaluation frameworks: golden datasets, automated quality metrics, regression detection
  • Implement and tune ranking/reranking systems: cross-encoders, LLM-based rerankers, learning-to-rank, custom scoring functions
  • Run experiments to validate hypotheses and quantify improvements before production rollout
  • Debug failure modes and build error taxonomies across retrieval, reasoning, and generation
  • Collaborate with AI and Software Engineers to hand off validated approaches for productionization
  • Stay current on applied research in RAG, agents, LLM evaluation, and relevance modeling; bring innovations into the product

Requirements

  • 5+ years of experience in applied research, data science, or ML with a focus on NLP, information retrieval, or knowledge systems
  • 2+ years of hands-on experience building or contributing to production AI/ML systems
  • Strong foundation in information retrieval: dense and sparse retrieval, embedding models, search relevance
  • Experience with RAG systems: chunking strategies, vector databases, retrieval optimization
  • Proficiency in evaluation methodology: metrics design, golden dataset creation, A/B testing, statistical significance
  • Strong Python skills and comfort with notebook-driven research workflows
  • Experience communicating research findings to engineering teams and translating insights into actionable improvements

Nice-to-Haves

  • Experience with compliance, legal, or document-heavy domains
  • Publications or contributions in IR, NLP, or RAG evaluation

Compensation & Benefits

  • Competitive base salary: $166,900 - $225,900
  • Stock equity (RSUs)
  • Up to 100% employer-paid medical, dental, and vision premiums
  • 401(k) plan, company-paid life and disability insurance
  • Paid Parental Leave (after 6 months)
  • Kindbody fertility and family-building benefits
  • Generous annual professional and personal development stipends
  • Flexible vacation policy and paid holidays
Skills
PythonRAGInformation RetrievalNLPMachine LearningVector DatabasesEvaluation MetricsA/B TestingStatistical AnalysisEmbedding Models
Similar roles at this salary range
All ML Engineering jobs →
Mem0

Senior Research Engineer

Own the end-to-end lifecycle of memory features for AI agents. Fine-tune models, implement research, build evaluations, and ship production systems with Engineering.

175k – 250kSan Francisco, CAML EngineeringOn-site7+ YOERAGvLLM
Mozilla

Senior Machine Learning Engineer

Senior ML Engineer focused on fine-tuning and deploying LLMs and generative AI features into Firefox, emphasizing privacy, latency, and user experience.

139k – 218kUnited StatesML EngineeringRemote4+ YOERayLangChain
Ironclad

Senior Software Engineer, AI

Lead design and delivery of high-priority AI initiatives across multiple codebases. Build and ship AI-powered features with strong backend fundamentals and product sense.

180k – 220kSan Francisco, CAML EngineeringHybrid5+ YOEReactEvals
Mercury

Senior Machine Learning Operations Engineer

Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.

167k – 208kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLSHAP
Distyl AI

AI Engineer, Evaluation

Design and implement evaluation frameworks and pipelines for AI systems using Evaluation-Driven Development. Build Python-based test suites, LLM graders, and measurement systems that guide prompt iteration and production deployment decisions.

150k – 250kSan Francisco, CA +1ML EngineeringHybrid2+ YOEPythonAI Systems