Software Engineer (Applied AI)

Owns end-to-end AI systems for relationship intelligence platform, building multi-agent LLM pipelines, classical ML models for ranking/entity resolution, evals, and data infrastructure to analyze deals and conversations for enterprise revenue teams. Requires production experience shipping LLM/ML products with strong backend skills.

170k – 220kSan Francisco, CANew York, NYML EngineeringRemote

Apply

About the role

What You Will Do

Design and ship multi-agent systems that handle the hardest reasoning problems in the product: stakeholder mapping, account research, deal health analysis, conversation intelligence.
Own the LLM pipelines end to end: prompt engineering, retrieval, tool use, structured outputs, guardrails, and the orchestration glue that ties it all together.
Build and maintain the ML and DS work that LLMs aren't the right tool for: ranking models, classifiers, embedding models, entity resolution across messy CRM data, signal extraction from sales conversations.
Fine-tune models when frontier APIs aren't enough. Curate training data, design eval sets, run experiments, and ship the results to production.
Build the eval infrastructure that lets us ship AI features without breaking them. LLM-as-judge, human-in-the-loop, classical metrics for ML systems, regression suites.
Own the data flywheel. The product generates rich signal from customer conversations, deal outcomes, and stakeholder interactions. Turn that into training data, eval data, and the feedback loops that compound over time.
Stay on the frontier. New models drop monthly. You'll know which ones move the needle for our use cases, when to switch, and when to wait.
Talk to customers. Sit on calls, see what's actually broken, and translate that into the AI capabilities that matter.

What We Are Looking For

Demonstrated experience shipping LLM-powered products to production with real customers and real evals.
Demonstrated experience training, fine-tuning, or shipping classical ML models in production. Ranking, classification, embeddings, retrieval.
Strong fluency with multi-agent systems, tool use, function calling, RAG, and the orchestration patterns that make them reliable. Frameworks are tools, not religion.
Real expertise in evaluation across both LLM and ML systems.
Strong backend engineering fundamentals. Most of this work lives in production services, not notebooks. Python is required; familiarity with TypeScript, Postgres, queues, and AWS is a major plus.
Sharp instinct for cost, latency, and reliability tradeoffs across the AI stack.
Excellent written and verbal English communication.
Demonstrated ability to operate independently.

Preferred Qualifications

Background as an MLE who has flexed into LLM application work, or as an LLM engineer with deep MLE foundations.
Experience fine-tuning open or closed models for specific tasks, including data curation, training infrastructure, and post-training evaluation.
Experience with multi-agent orchestration frameworks (LangGraph, Mastra, custom orchestrators) at production scale.
Experience with classical ML systems in production: ranking models, embedding models, entity resolution, recommendation systems.

Compensation

$170,000 to $220,000 base salary depending on level, plus...

Skills

LLMsMulti-Agent SystemsPrompt EngineeringRAGFine-TuningEvaluation FrameworksPythonRanking ModelsClassifiersEmbedding ModelsEntity ResolutionTypeScriptPostgresAWSLangGraph

Similar roles

ML Engineering jobs

Airbnb

Machine Learning Engineer

Build and deploy cutting-edge Agentic AI and LLM systems to transform Airbnb's customer service experience. Requires PhD or equivalent experience and production ML/AI deployment expertise.

170k – 180kSan Francisco, CA +1ML EngineeringOn-site3+ YOESftRAG

Notable

AI Platform Engineer

Design, build, and maintain LLM integrations powering AI features. Own end-to-end delivery from requirements through production monitoring with focus on scalability and reliability.

170k – 205kSan Mateo, CAML EngineeringHybrid5+ YOEGCPGKE

Skydio

Autonomy Engineer - ML & DL Infrastructure

Builds and scales ML/DL infrastructure including data pipelines, annotation workflows, training, deployment, and monitoring for autonomous drone systems. Requires hands-on experience in data engineering, cloud ML platforms, containerization, and MLOps.

170k – 278kSan Mateo, CAML EngineeringHybridMl OpsDocker

Forus

Research Engineer

Research Engineer building production LLM and ML systems for healthcare workflows. Requires strong ML/NLP research background with publications, production deployment experience, and proficiency in PyTorch/TensorFlow/JAX.

170k – 300kNew York, NYML EngineeringOn-siteJAXNLP

Fieldguide

AI Engineer, Quality (Evals)

Owns evaluation infrastructure for AI agents in audit workflows, building unified platforms, automated pipelines, observability, and feedback loops to ensure enterprise-scale reliability. Requires experience with LLMs, TypeScript/Python, and production AI systems.

170k – 220kSan Francisco, CAML EngineeringRemote3+ YOERAGLLMs