Member of Technical Staff

Build specialized evals and automated pipelines to measure and improve answer quality for Perplexity's LLM-powered search engine, focusing on retrieval, tool calls, and visual rendering. Requires 4+ years in data science/ML, strong Python/SQL, and cloud experience (MS/PhD preferred).

200k – 300kSan Francisco, CAData ScienceHybrid4+ YOE

Apply

About the role

Responsibilities

Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness.
Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality.
Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices.
Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements.
Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality.

Requirements

PhD or MS in a technical field or equivalent experience.
4+ years of experience in data science or machine learning.
Strong proficiency in Python and SQL (expected to write production-grade code).
Experience building within a modern cloud data stack, specifically AWS and Databricks.
Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster.

Preferred Qualifications

1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups.
Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale.
A strong research background, with experience applying research methods to real-world ML problems.
Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets.

Skills

PythonSQLAWSDatabricksLLMsVlmMachine LearningData ScienceLlm-As-A-Judge

Similar roles

Data Science jobs

Sardine

Lead - POC Data Science

Lead a PoC data science team delivering fraud-prevention ML models and client-facing proof-of-concept projects for enterprise financial institutions. Player-coach role combining people leadership, hands-on modeling, and direct client engagement.

200k – 280kUnited StatesData ScienceRemote10+ YOESQLSpark

Zocdoc

Staff Data Scientist, Marketplace

As a Staff Data Scientist, Marketplace Analytics, you will be a senior analytical voice at Zocdoc, leading experimentation and analytical strategy across product, growth, monetization, and marketing initiatives. You will apply rigorous experimentation and causal inference frameworks to complex business problems.

200k – 270kNew York, NYData ScienceHybrid8+ YOESQLPython

Imprint

Staff Data Scientist

Staff Data Scientist owning end-to-end analytical projects that influence product decisions, marketing campaigns, and executive strategy. Applies statistical methods, experimentation design, and AI-powered systems to improve customer lifetime value and business outcomes.

200k – 250kNew York, NY +1Data ScienceHybrid7+ YOESQLLLMs

OpenX

Staff Data Scientist

Leads cross-team data science initiatives architecting scalable ML systems for ad marketplace optimization, bidding strategies, and prediction at exchange scale. Requires PhD with 6+ years or MS/BS with 8+ years in ML, deep learning expertise, Python/SQL, and production ML experience.

196k – 219kNew York, NYData ScienceRemote6+ YOESQLGCP

Sift

Staff Data Scientist

Staff Data Scientist owning advanced ML modeling strategies for fraud detection across payment fraud, account takeover, and identity abuse. Requires 5+ years production modeling experience, deep fraud/security domain expertise, and mastery of tree-based, deep learning, and graph methods.

195k – 265kUnited StatesData ScienceRemote5+ YOECnnsRnns