Lead AI Engineer

Leads development of proprietary AI reasoning model TRAM for interpreting global trade law, building data pipelines, fine-tuning LLMs, and evaluation frameworks for high-speed, accurate compliance determinations. Requires AI product experience, especially RAG systems and model fine-tuning.

250k – 280kSan Francisco, CAML EngineeringOnsite

Apply

About the role

What You'll Do

Within weeks:

Lead development of new features aimed at increasing TRAM’s test-time accuracy
Work on the underlying data and retrieval pipelines that help power our AI workflows
Work directly with our internal tax experts to understand how TRAM can better reason like them

Within months:

Own TRAM’s eval framework and workflows
Work directly with leading frontier labs to reinforce fine tune models on our proprietary data

Requirements

Prior experience building AI enabled products, particularly RAG systems
Experience fine tuning base models, ideally via RF
Willingness to dive into tax technical problems
A strong understanding of how LLMs and reasoning models function

Nice to Haves

Experience working with LLMs on legal applications
Experience with RAG data pipelines and collecting/curating data for the pipeline

Skills

RAGLLMsFine-TuningRetrieval PipelinesEvaluation FrameworksData PipelinesReasoning Models

Similar roles

ML Engineering jobs

OpenAI

Researcher, Alignment Oversight

Designs and runs experiments to improve oversight of increasingly capable AI models, including model training, evaluation, and deployment of practical systems. Analyzes failures and develops techniques to train more aligned models using oversight signals.

250k – 445kSan Francisco, CAML EngineeringHybridLLMsPyTorch

Luma AI

Research Scientist / Engineer — Multimodal Agent

Builds and trains large-scale multimodal agentic models involving reasoning, planning, coding, and tool calling. Requires strong ML foundations, PyTorch expertise, and experience with distributed training on massive datasets.

250k – 450kPalo Alto, CAML EngineeringHybridVlmLLMs

Variance

Research Engineer, Evals

Build benchmarks, datasets, and evaluation systems to measure and improve AI model quality for fraud, identity, and risk judgment tasks. Collaborate across research, engineering, and product to drive rigorous experimentation and iteration in high-stakes environments.

250k – 400kSan Francisco, CAML EngineeringOn-siteLLMsPython

Variance

Research Engineer, Judgment Systems

Research Engineer designs evaluations, studies model failures, and builds research loops to improve AI agents for high-stakes fraud detection and judgment tasks. Requires ML training experience, experimental rigor, and strong engineering skills in adversarial environments.

250k – 400kSan Francisco, CAML EngineeringOn-siteLLMsPython

Moment

AI Engineer

Builds and deploys AI primitives and agents to automate workflows and enhance user experiences in investment management platform. Requires AI agent experience, distributed systems knowledge, and product-minded engineering across tech stacks.

250k – 325kNew York, NYML EngineeringOn-siteGoPython