Applied AI Researcher, Benchmarking
Designs and constructs AI benchmarks and evaluation frameworks to measure reasoning, reliability, and real-world impact of intelligent systems. Requires experience with model evaluations, statistical rigor, building with AI models, and strong programming for prototypes.
Key Responsibilities
- Design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact.
- Construct benchmarks that reflect real-world complexity to judge new architectures, techniques, and releases.
- Explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment.
- Investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability.
Who You Are (Requirements)
- Experience designing and running evaluations: built or maintained benchmarks, test suites, or experimental frameworks.
- Statistical and analytical rigor: design fair, reproducible experiments and extract signal from noisy results.
- Experience building with models (compound AI systems, agentic collaboration, ensembling, ReAct, graph-of-thoughts, etc.).
- Proven track record of research results (publications, public work).
- Uses AI every day (ChatGPT, Cursor, Perplexity).
- Strong programming and data analysis skills for prototypes and experiments.
- Biases towards showing vs telling.
Compensation & Benefits
- Base salary: $150K – $250K (depending on experience, location, level).
- Meaningful equity.
- 100% covered medical, dental, vision for employees/dependents.
- 401(k), commuter benefits, in-office lunch.
- Access to state-of-the-art models and AI tools.
AI Research Scientist, New Grad – Agents & Reinforcement Learning
Conduct research on autonomous AI agents and reinforcement learning to build self-improving systems that reason, code, and learn at scale within the Snowflake Data Cloud. Requires a PhD (or equivalent) and strong expertise in RL and agentic AI.
Frontier Agents Intern
Research intern on the Agents team building and aligning frontier AI systems for complex agentic and scientific tasks. Focus on post-training methods, evaluation frameworks, self-learning, and scalable agent infrastructure.
Post-Doctoral Researcher
Post-doctoral researcher conducting independent and collaborative AI/ML research focused on high-impact domains like medicine, finance, and law. Requires a recent or imminent PhD and publications in top venues.
Senior Software Engineer - Python/Typescript
Senior engineer building AI-driven automation systems to replace manual business workflows across operations, sales, and support. Requires 7+ years experience, production Python/TypeScript skills, and 1-2 years building agentic AI systems.
Research Engineer
As a Research Engineer, you will conduct and enable cutting-edge research, translating it into the core product pipeline. You will develop and improve state-of-the-art data curation strategies, accelerating research and ensuring product innovation.