Senior Research Engineer
Own the end-to-end lifecycle of memory features for AI agents. Fine-tune models, implement research, build evaluations, and ship production systems with Engineering.
What You'll Do
- Fine-tune and train models for memory extraction, updates, consolidation/forgetting, and conflict resolution; iterate based on data and outcomes.
- Read, reproduce, and implement research: quickly prototype paper ideas, benchmark against baselines, and productionize what wins.
- Build evaluation at scale: automated relevance/accuracy/consistency metrics, gold sets, online A/B & interleaving, and clear dashboards.
- Work closely with customers to uncover pain points, turn them into research hypotheses, and validate solutions through field trials.
- Partner with Engineering to ship: design APIs and data contracts, plan safe rollouts, and maintain SOTA latency, reliability, and cost at scale.
Minimum Qualifications
- Experience in RAG or information retrieval (retrieval, ranking, query understanding) for real products.
- Model training/fine-tuning experience (LLMs/encoders) with a strong footing in experimental design and iteration.
- Strong Python; deep experience with PyTorch and familiarity with vLLM and modern serving frameworks.
- Built evaluation for complex language and/or retrieval and generation tasks (gold sets, offline metrics, online tests).
- Able to orchestrate data pipelines to run these models in production with low-latency SLAs (batch + streaming).
- Clear, concise communication with stakeholders (engineering, product, GTM, and customers).
Nice to Have
- Publications at venues like NeurIPS, ICML, ACL, etc.
- Experience with privacy-preserving ML (redaction, differential privacy, data governance).
- Deep familiarity with memory/retrieval literature or prior work on memory systems.
- Expertise with embeddings, vector-DB internals, deduplication, and contradiction detection.
Senior Machine Learning Operations Engineer
Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.
AI Engineer, Evaluation
Design and implement evaluation frameworks and pipelines for AI systems using Evaluation-Driven Development. Build Python-based test suites, LLM graders, and measurement systems that guide prompt iteration and production deployment decisions.
Senior AI Engineer
Senior Engineer building multi-agent AI systems, LLM integrations, and backend automation services that power Marketing Operations. Owns technical direction for agentic infrastructure connecting models to business systems.
Senior Machine Learning Engineer
Build and deploy cutting-edge Agentic AI and LLM systems to transform Airbnb's customer service experience, including Chat and Voice AI assistants. Requires 6+ years experience with production ML/AI systems at scale.