Skip to content

Data Scientist, Core Experimentation

Leads evolution of OpenAI's core experimentation platform, driving statistical strategy, designing methodologies, and building scalable Python/Spark pipelines to ensure reliable, trustworthy experiments at massive scale. Requires deep stats expertise, causal inference, and production experimentation experience.

293k – 325kBellevue, WASeattle, WAData ScienceHybrid

About the role

Responsibilities

  • Drive the statistical direction and technical strategy for OpenAI’s experimentation platform
  • Design and improve experimentation methodologies used across product and research teams
  • Build pragmatic solutions to real-world experimentation challenges, balancing rigor with operational simplicity
  • Improve the reliability and trustworthiness of experiment results, including detection and prevention of bias, logging issues, and data quality failures
  • Develop scalable analytical systems and pipelines in Python and distributed compute environments
  • Partner with engineers and product teams to improve experiment design, metric quality, and decision-making practices
  • Lead investigations into complex experimentation anomalies and measurement failures
  • Establish best practices for experimentation governance, interpretation, and statistical correctness
  • Mentor other data scientists and raise the overall technical bar for experimentation and causal inference

Requirements

  • Experience building, scaling, or operating experimentation platforms at a large technology company
  • Deep expertise in statistics, causal inference, and online experimentation methodology
  • Strong understanding of practical experimentation challenges in production systems
  • Experience with areas such as variance reduction, CUPED, sequential testing, SRM detection, metric design, or heterogeneous effects
  • Strong coding and systems skills in Python and large-scale data processing frameworks (e.g. Spark)
  • Experience designing analytical data models and scalable experimentation pipelines
  • Ability to communicate complex statistical concepts clearly to technical and non-technical audiences
  • Track record of influencing technical strategy through hands-on technical leadership

Nice-to-Haves

  • Experience in large-scale product experimentation, ML experimentation, ranking systems, marketplace systems, or similar high-scale experimentation domains

Compensation

$293K - $325K USD

Skills

PythonSparkStatisticsCausal InferenceOnline ExperimentationCupedSequential TestingSrm DetectionVariance ReductionMetric Design

Similar roles

Data Science jobs

Data Scientist, Integrity Measurement

Owns measurement, metrics, and analysis for trust & safety harms including prevalence estimation and response gaps using AI-first methods. Requires strong statistics, data programming (Python/R/SQL), and trust/safety experience.

293k – 385kSan Francisco, CA +1Data ScienceHybridRSQL

Data Scientist, Supply

Data Scientist focused on compute allocation and causal inference to optimize AI infrastructure decisions and connect supply choices to user outcomes. Requires strong Python/SQL skills and experience with constrained optimization and production systems.

285k – 460kSan Francisco, CA +1Data ScienceOn-site5+ YOESQLPython

Economist

Economist (up to 5 years post-PhD) conducting empirical research on AI’s economic impacts using large datasets, causal inference, and structural modeling. Requires PhD and strong econometrics/SQL/Python skills.

266k – 385kSan Francisco, CAData ScienceHybrid3+ YOERSQL

Research Economist, Economic Research

Measures AI's economic impacts through the Anthropic Economic Index using econometrics, ML, and novel data. Conducts empirical research on labor markets, productivity, inequality; requires PhD in Economics and strong empirical track record.

320k – 405kSan Francisco, CAData ScienceHybridRSQL

Data Scientist, Safety Systems

Lead data-driven safety evaluation for AI production systems by defining metrics, implementing statistical methods, building dashboards, and analyzing real-world impacts. Requires 5+ years in quantitative roles with leadership and strong stats expertise.

255k – 405kSan Francisco, CAData ScienceOn-site5+ YOESQLNLP