Skip to content

Applied AI Researcher, Benchmarking

150k – 250kSan Francisco, CANew York, NYHybrid
Summary

Designs and constructs AI benchmarks and evaluation frameworks to measure reasoning, reliability, and real-world impact of intelligent systems. Requires experience with model evaluations, statistical rigor, building with AI models, and strong programming for prototypes.

About the role

Key Responsibilities

  • Design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact.
  • Construct benchmarks that reflect real-world complexity to judge new architectures, techniques, and releases.
  • Explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment.
  • Investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability.

Who You Are (Requirements)

  • Experience designing and running evaluations: built or maintained benchmarks, test suites, or experimental frameworks.
  • Statistical and analytical rigor: design fair, reproducible experiments and extract signal from noisy results.
  • Experience building with models (compound AI systems, agentic collaboration, ensembling, ReAct, graph-of-thoughts, etc.).
  • Proven track record of research results (publications, public work).
  • Uses AI every day (ChatGPT, Cursor, Perplexity).
  • Strong programming and data analysis skills for prototypes and experiments.
  • Biases towards showing vs telling.

Compensation & Benefits

  • Base salary: $150K – $250K (depending on experience, location, level).
  • Meaningful equity.
  • 100% covered medical, dental, vision for employees/dependents.
  • 401(k), commuter benefits, in-office lunch.
  • Access to state-of-the-art models and AI tools.
Skills
AI benchmarksevaluation frameworksLLM evaluationReActgraph-of-thoughtsensemblingPythondata analysiscompound AI systemsagentic systems
Similar roles at this salary range
All AI Research jobs →
Snowflake

AI Research Scientist, New Grad – Agents & Reinforcement Learning

Conduct research on autonomous AI agents and reinforcement learning to build self-improving systems that reason, code, and learn at scale within the Snowflake Data Cloud. Requires a PhD (or equivalent) and strong expertise in RL and agentic AI.

176k – 230kBellevue, WAAI ResearchOn-siteEntry levelJAXDPO
Together AI

Frontier Agents Intern

Research intern on the Agents team building and aligning frontier AI systems for complex agentic and scientific tasks. Focus on post-training methods, evaluation frameworks, self-learning, and scalable agent infrastructure.

121k – 131kSan Francisco, CAAI ResearchOn-siteEntry levelJAXNLP
Snowflake

Post-Doctoral Researcher

Post-doctoral researcher conducting independent and collaborative AI/ML research focused on high-impact domains like medicine, finance, and law. Requires a recent or imminent PhD and publications in top venues.

160k – 220kBellevue, WAAI ResearchHybridEntry levelJAXRAG
SpotOn

Senior Software Engineer - Python/Typescript

Senior engineer building AI-driven automation systems to replace manual business workflows across operations, sales, and support. Requires 7+ years experience, production Python/TypeScript skills, and 1-2 years building agentic AI systems.

160k – 190kChicago, IL +3AI ResearchHybrid7+ YOEAWSLLMs
Datology AI

Research Engineer

As a Research Engineer, you will conduct and enable cutting-edge research, translating it into the core product pipeline. You will develop and improve state-of-the-art data curation strategies, accelerating research and ensuring product innovation.

180k – 300kRedwood City, CAAI ResearchOn-site4+ YOEML ModelsAI Models