Skip to content

People Research Data Scientist, AI Fairness & Bias

198k – 220kSan Francisco, CANew York, NYSeattle, WAWashington, DCData ScienceHybrid5+ YOE
Summary

People Data Scientist focused on AI fairness and bias testing for People systems. Designs algorithmic audits, validation studies, and fairness infrastructure across the employee lifecycle. Requires deep expertise in fairness metrics, statistical modeling, and Python/R/SQL.

About the role

Responsibilities

  • Define and lead fairness and bias-testing strategies for AI-assisted People processes, models, agents, and decision-support systems from development through deployment and ongoing monitoring.
  • Design rigorous algorithmic audits and validation studies, including adverse-impact analysis, subgroup and intersectional evaluation, error-rate analysis, calibration, measurement invariance, reliability, criterion-related validity, and sensitivity testing.
  • Identify the appropriate fairness criteria for each use case, evaluate tradeoffs among competing definitions of fairness, and clearly document the assumptions, limitations, and residual risks of each approach.
  • Evaluate end-to-end human-AI decision systems, including model outputs, user behavior, human overrides, escalation pathways, and whether AI assistance changes the quality, consistency, or equity of decisions.
  • Develop evaluation approaches for generative and agentic AI, including test-set design, counterfactual testing, behavioral evaluation, human-rating studies, robustness testing, and analysis of disparate performance across populations and contexts.
  • Investigate the sources of observed disparities, including data representation, label and measurement bias, proxy variables, model design, decision thresholds, workflow design, and differential adoption or usage.
  • Partner with engineering, People Operations, Legal, Privacy, Security, and People Systems teams to recommend and evaluate mitigations such as data improvements, model changes, threshold adjustments, workflow redesign, monitoring controls, and additional human oversight.
  • Build scalable fairness-evaluation infrastructure, including reusable datasets, automated validation pipelines, regression tests, monitoring systems, self-service tools, and standardized reporting.
  • Establish research and documentation standards for fairness test plans, dataset and model documentation, validation reports, limitations, monitoring plans, and decision records.
  • Translate complex findings into concise, decision-ready narratives, helping leaders understand the significance of identified risks, the strength of the evidence, available mitigation options, and remaining uncertainty.

Requirements

  • Deep expertise in algorithmic fairness, bias measurement, responsible AI, psychometrics, applied statistics, or the evaluation of high-impact decision systems.
  • Exceptional strength in research design, measurement, experimentation, causal inference, and statistical modeling.
  • Hands-on experience applying methods such as subgroup and intersectional analysis, adverse-impact testing, equalized-odds and equal-opportunity analysis, demographic-parity assessment, calibration analysis, counterfactual testing, measurement invariance, reliability analysis, and validation studies.
  • Strong judgment about the limitations of fairness metrics, including the ability to determine which measures are appropriate for a particular decision context rather than applying a single universal definition of fairness.
  • Experience evaluating machine-learning models, generative AI systems, agents, or human-AI workflows using quantitative and qualitative evidence.
  • High proficiency in Python or R and SQL, with experience working across complex, sensitive, and imperfect datasets.
  • Experience building reproducible evaluation pipelines, automated testing frameworks, analytical tools, monitoring systems, or governed research workflows.
  • Ability to distinguish statistical disparities from their potential causes and to communicate findings without overstating certainty or making unsupported causal or legal conclusions.
  • Ability to work effectively with technical, operational, legal, privacy, and executive stakeholders and influence consequential decisions through evidence and sound judgment.
  • Deep curiosity, intellectual humility, strong attention to detail, and a commitment to developing AI systems and organizational processes that work well for people across different backgrounds and circumstances.

Nice-to-Haves

  • Experience conducting fairness assessments, algorithmic audits, model-risk reviews, adverse-impact analyses, or validation studies in employment or another high-impact domain.
  • Familiarity with fairness and model-evaluation tools such as Fairlearn, AI Fairness 360, responsible-AI evaluation frameworks, explainability methods, or comparable internal tooling.
  • Experience evaluating large language models, generative AI systems, safety classifiers, or agentic workflows, including behavioral testing and human evaluation.
  • Experience with employment selection, talent assessment, psychometrics, organizational research, or the validation of hiring, performance, promotion, or workforce decisions.
  • Familiarity with responsible-AI frameworks and emerging requirements related to automated employment decision systems, algorithmic auditing, data privacy, and AI governance.
  • Experience creating model cards, dataset documentation, fairness scorecards, audit reports, monitoring plans, or other review artifacts for high-impact systems.
  • Advanced degree in Quantitative Psychology, Computer Science, Statistics, Economics, Data Science, Behavioral Science, or a related quantitative field; PhD preferred but not required.
Skills
PythonRSQLalgorithmic fairnessbias measurementresponsible AIpsychometricsapplied statisticscausal inferencestatistical modelingsubgroup analysisadverse-impact testingequalized-odds analysisdemographic-parity assessmentcounterfactual testing
Similar roles at this salary range
All Data Science jobs →
Plaid

Senior Data Scientist

First Data Scientist on Plaid's Embedded Insights team building analytics and measurement frameworks for ML models. Partners with product, engineering, and ML teams to drive data-informed decisions and evaluate model performance.

191k – 263kSan Francisco, CA +2Data ScienceHybrid5+ YOESQLDBT
Gusto

Senior Data Scientist, Risk

Staff Data Scientist on the Risk team driving experimentation, statistical inference, and causal analysis to inform strategic decisions. Requires 7-10 years of DS experience, strong SQL/Python, and proven impact influencing leadership.

186k – 230kSan Francisco, CA +1Data ScienceHybrid7+ YOESQLPython
Findigs

Data Scientist

Own feature engineering, model iteration, and A/B testing for an AI underwriting engine that influences rental decisions. Build production risk models and partner with Product and Engineering on high-stakes decisioning systems.

160k – 185kNew York, NYData ScienceHybrid4+ YOESQLdbt
Prove AI

Data Science Lead

Lead data science strategy and architecture for scalable analytics and ML systems. Design data architectures, evaluate novel data sources, establish analytical methodologies, and bridge R&D to production.

179k – 200kUnited StatesData ScienceRemote6+ YOERSQL
Coinbase

Senior Data Scientist, CX Analytics

Senior Data Scientist on the CX Analytics team owning revenue calibration models, causal inference experiments, and LLM-powered classification pipelines to connect customer support interactions to business outcomes.

180k – 212kUnited StatesData ScienceRemote5+ YOERSQL