Model Behavior Engineer

Owns quality for Notion AI products through context engineering, evals, data analysis, debugging, and model evaluation with top AI labs. Requires LLM experience, analytical skills, and driver mentality; no traditional coding.

98k – 140kNew York, NYML EngineeringHybrid

Apply

About the role

What You'll Achieve

Context engineering — Design, test, iterate on system prompts, tool prompts, context strategies.
Understand & debug — Analyze production data, transcripts, logs, user feedback; reproduce issues, find root causes.
Build evals & Measurement — Design eval strategies, build datasets, track quality, own improvement loops.
Evaluate and launch new models — Benchmark models from OpenAI, Anthropic, Google on quality, latency, cost, edge cases.
Drive quality priorities — Surface issues for eng/product teams, own quality narrative.
Build tooling & systems — Manage AI observability (e.g., Braintrust), build playbooks/tools.

Skills You’ll Need

Driver mentality, bias to action.
Curiosity about LLM capabilities.
Analytical instinct, find signal in noise.
Comfortable with data (SQL, coding agents).
Clear communication.
Experience with LLMs, prompting, or AI products.

Nice to Haves

Backgrounds in engineering, product, data science, research, consulting.
Built personal projects/side projects/startups.

Compensation

For New York City: $98,000 - $140,000 base salary per year, plus equity and benefits.

Skills

LLMsPrompt EngineeringSQLAi EvalsContext EngineeringBraintrustAi ObservabilityModel BenchmarkingData AnalysisProduction Debugging

Similar roles

ML Engineering jobs

Fetch

Automation Lead, AI Operations

Leads automation and evaluation programs for AI Operations, architecting scalable workflows with LLMs, datasets, and quality systems. Requires 5+ years leading complex AI initiatives, cross-functional influence, and metrics-driven impact; familiarity with data pipelines preferred.

97k – 114kUnited StatesML EngineeringRemote5+ YOELLMsAPIs

Astera

General DiffUSE Job Application

Open general application for computational biology, ML research, data science, software engineering, and program roles at DiffUSE, focused on protein dynamics, structural data infrastructure, and open science tooling.

100k – 300kEmeryville, CAML EngineeringOn-siteMmcifCryo-Em

GitLab

AI Engineer

As an AI Engineer, you will build and deploy AI-powered solutions to drive business outcomes across Sales, Marketing, and Customer Support. This role requires strong engineering skills, systems thinking, and a product mindset to own initiatives from discovery to deployment.

108k – 130kUnited StatesML EngineeringRemoteLLMsPython

Writer

Software Engineer, Generative AI

Build and scale secure generative AI services and applications using Python, LLMs, and modern frameworks. Own architecture decisions from proposal through production deployment for enterprise customers.

112k – 304kSan Francisco, CA +2ML EngineeringHybrid3+ YOEAWSGCP

Axle

AI/ML Scientist/Developer

Develops machine learning models to predict and optimize organoid growth and differentiation protocols using biological data. Requires Master's/PhD in CS/engineering/math, Python/R proficiency, and ML frameworks like TensorFlow/PyTorch, with biology lab experience.

115k – 130kFrederick, MDML EngineeringOn-siteRPython