Data Scientist, Integrity Measurement

Owns measurement, metrics, and analysis for trust & safety harms including prevalence estimation and response gaps using AI-first methods. Requires strong statistics, data programming (Python/R/SQL), and trust/safety experience.

293k – 385kSan Francisco, CANew York, NYData ScienceHybrid

Apply

About the role

In this role, you will:

Own measurement and quantitative analysis for severe, actor- and network-based usage harm verticals.
Develop and implement AI-first methods for prevalence measurement and other productionised safety metrics, including off-platform indicators.
Build metrics for goaling or A/B tests.
Own dashboards and metrics reporting for harm verticals.
Conduct analyses to inform improvements to review, detection, or enforcement.
Optimise LLM prompts for measurement.
Collaborate with safety teams on policies.
Provide metrics for leadership and external reporting.
Develop automation leveraging agentic products.

You might thrive in this role if you:

Are a senior DS with trust and safety experience.
Have deep statistics skills, especially sampling methods and prevalence estimation.
Have experience with severe harm areas like child safety or violence.
Are an excellent communicator with strong cross-functional skills.
Are capable in data programming languages (R or Python, SQL).
(Ideally) have experience with AI harms or leveraging AI for measurement.

Skills

PythonRSQLStatisticsSampling MethodsPrevalence EstimationLlm PromptingDashboardsA/B TestingAI/ML

Similar roles

Data Science jobs

OpenAI

Data Scientist, Core Experimentation

Leads evolution of OpenAI's core experimentation platform, driving statistical strategy, designing methodologies, and building scalable Python/Spark pipelines to ensure reliable, trustworthy experiments at massive scale. Requires deep stats expertise, causal inference, and production experimentation experience.

293k – 325kBellevue, WA +1Data ScienceHybridSparkCuped

Anthropic

Data Scientist, Supply

Data Scientist focused on compute allocation and causal inference to optimize AI infrastructure decisions and connect supply choices to user outcomes. Requires strong Python/SQL skills and experience with constrained optimization and production systems.

285k – 460kSan Francisco, CA +1Data ScienceOn-site5+ YOESQLPython

OpenAI

Economist

Economist (up to 5 years post-PhD) conducting empirical research on AI’s economic impacts using large datasets, causal inference, and structural modeling. Requires PhD and strong econometrics/SQL/Python skills.

266k – 385kSan Francisco, CAData ScienceHybrid3+ YOERSQL

Anthropic

Research Economist, Economic Research

Measures AI's economic impacts through the Anthropic Economic Index using econometrics, ML, and novel data. Conducts empirical research on labor markets, productivity, inequality; requires PhD in Economics and strong empirical track record.

320k – 405kSan Francisco, CAData ScienceHybridRSQL

OpenAI

Data Scientist, Safety Systems

Lead data-driven safety evaluation for AI production systems by defining metrics, implementing statistical methods, building dashboards, and analyzing real-world impacts. Requires 5+ years in quantitative roles with leadership and strong stats expertise.

255k – 405kSan Francisco, CAData ScienceOn-site5+ YOESQLNLP