Skip to content

User Researcher, AI Evaluations

196k – 230kSan Francisco, CANew York, NYHybrid5+ YOE
Summary

UX Researcher defining and scaling evaluation frameworks for Notion's AI-powered experiences. Focuses on establishing rubrics for model output quality and end-to-end user experience, running longitudinal studies, and operationalizing evaluation processes with product and data science teams.

About the role

What You'll Achieve

Define what “good” looks like (frameworks & rubrics)

  • Establish clear, reusable evaluation criteria that reflect real user expectations—helpfulness, trust, tone, control, and transparency
  • Translate qualitative insight into scoring guidance that can be applied consistently across teams and over time

Run recurring evals (longitudinal & feature-specific)

  • Run recurring longitudinal and feature-specific surveys and studies to measure experience quality over time against defined rubrics
  • Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts
  • Help teams spot regressions, benchmark improvements, and understand when expectations shift

Anchor evaluation in real workflows (context > isolated feedback)

  • Ensure evals reflect jobs-to-be-done, user intent, and the full interaction journey (goal setting, delegation, review, iteration)
  • Help teams understand who is evaluating, what they’re trying to do, and why outputs succeed or fail

Identify failure modes & recovery behavior (guardrails)

  • Uncover breakdowns, regressions, and edge cases across the system—from model behavior to UI and integrations
  • Study how people notice issues, correct them, and continue their work
  • Turn insights into actionable guidance for guardrails, fixes, and prioritization

Operationalize evaluation with partners (process & tooling)

  • Collaborate closely with Product, Design, Engineering, and Data Science to align on target use cases
  • Build scalable evaluation loops (human-in-the-loop review, longitudinal studies, and calibration of automated/LLM-judge approaches against human judgment)

Skills You'll Need to Bring

  • Ability to operationalize insight into measurement: turning “soft” user expectations (trust, tone, usefulness, clarity) into concrete rubrics, scoring guidelines, and observable metrics
  • AI fluency and systems thinking: hands-on with AI products, reasoning about how model behavior, uncertainty, and system constraints shape user experience
  • Experience evaluating AI-enabled products (LLMs, agents, generative UI/workflow automation) and working with Data Science/ML partners on measurement strategy and evaluation tooling
  • Clear communication and impact orientation: aligning diverse partners around shared definitions of quality, creating artifacts that enable teams to act consistently
  • Strong UX research craft (quant + qual): choosing the right methods for the question—interviews, benchmarking, surveys, experiments—and synthesizing into actionable guidance
  • Pragmatism in fast-moving environments: prioritizing ruthlessly, working through ambiguity, balancing scrappy iteration with deep dives

Experience

  • 5+ years doing UX research in industry

Nice to Haves

  • Familiarity with LLM-as-judge methods, prompt design for evaluators, or “golden dataset” creation
  • Experience using AI research tooling for rapid synthesis and communication (e.g., Dovetail, Listen Labs, Maze, Outset, etc.), as well as AI observability tooling like Braintrust
  • Experience using data querying languages (e.g., SQL), scripting languages (e.g., Python), or statistical/mathematical software (e.g., R, SAS, Matlab, etc.)
  • Master’s or PhD in HCI, Psychology, Behavioral Science, Anthropology, Sociology, or a related field
Skills
UX ResearchQualitative ResearchQuantitative ResearchAI Product EvaluationLLM EvaluationRubric DevelopmentHuman-in-the-Loop EvaluationData Science CollaborationPythonSQL
Similar roles at this salary range
All UX Research jobs →
Notion

User Researcher

Conduct end-to-end user research on AI-powered product features, partnering with Design, Product, and Engineering to deliver actionable insights that shape Notion's AI experiences. Requires 3+ years of UX research experience and strong AI fluency.

164k – 190kSan Francisco, CA +1UX ResearchHybrid3+ YOEAI FluencyUX Research
Glean

Senior User Researcher, Enterprise & Platform

Senior User Researcher focused on admin, IT, and builder personas for enterprise platform experiences. Conducts mixed-methods research to improve deployment, governance, and adoption of AI-powered enterprise tools.

185k – 210kSan Francisco, CAUX ResearchHybrid5+ YOEJTBDB2B Research
Glean

Senior User Researcher, Enterprise & Platform

Senior User Researcher focused on admin, IT, and builder personas for enterprise platform experiences. Conducts mixed-methods research to improve deployment, governance, and adoption of enterprise AI tools.

185k – 210kMountain View, CAUX ResearchHybrid5+ YOEJTBDB2B Research
Fetch

Principal UX Researcher, Consumer

As a Principal UX Researcher, Consumer, you will lead strategic research initiatives, deeply understand user behavior, and influence product and business strategy for the Fetch app. You will be responsible for end-to-end research, from identifying opportunities to delivering impactful insights.

207k – 244kUnited StatesUX ResearchRemote8+ YOERAI
Abridge

User Research Lead

Lead user research strategy and execution for an AI healthcare platform, managing a small team while partnering with product and design leadership to shape product direction based on deep insights from clinicians and patients.

211k – 263kSan Francisco, CAUX ResearchHybrid8+ YOEUser ResearchTeam Leadership