# Senior Data Scientist - Big Data R&D, Identity Graph & KYC
**Company:** [Socure](https://hotfix.jobs/companies/socure)
**Location:** Remote
**Salary:** $140K-$170K
**Experience:** 3+ years
**Skills:** Python, Pyspark, Spark, AWS, Databricks, scikit-learn, Xgboost, TensorFlow, PyTorch, SQL, Airflow, Neo4J, Graphframes, A/B Testing, Machine Learning
**Posted:** 2026-04-23
> Leads design and deployment of ML, statistical, and graph algorithms for entity resolution, identity graphs, and KYC on large-scale PII datasets. Builds scalable pipelines in Spark/PySpark on AWS/Databricks, runs experiments, and mentors juniors. Requires Master's +3 years or PhD experience.
## Job Description
## What You'll Do

- Own the design, development, and evaluation of machine learning, statistical, and graph-based algorithms for entity-resolution, identity trust scoring, and anomaly detection on massive datasets.
- Architect and optimize graph-based identity representations (identity graph structure, linkage rules, clustering) to improve match rates, reduce false positives/negatives, and support downstream fraud and KYC models.
- Build and maintain scalable data pipelines and feature stores in **Spark/PySpark** (or Scala), including data normalization, deduplication, and feature computation across large PII datasets in **AWS/Databricks** environments.
- Lead **A/B tests** and offline/online experimentation for new models, features, and data sources; define success metrics, design experiments, and ensure rigorous validation before rollout.
- Evaluate new internal and external data sources: explore signal quality, design backtests, quantify incremental value, and provide clear recommendations on vendor selection and integration.
- Partner closely with product managers and engineers to translate ambiguous business and regulatory requirements (e.g., KYC coverage, watchlist matching) into concrete modeling and data roadmaps.
- Provide deep analytical support to Socure’s compliance and regulatory product suite, including investigative analyses, root‑cause analysis for anomalies, and clear narratives for internal and external stakeholders.
- Contribute to model governance and documentation: clearly explain model logic, data dependencies, limitations, and monitoring plans to internal risk/compliance stakeholders.
- Mentor junior data scientists and engineers on best practices in data exploration, feature engineering, experimentation, and code quality.
- Communicate complex technical concepts and trade‑offs in a concise, structured way to both technical and non‑technical audiences (e.g., product reviews, customer meetings, internal briefings).

## What You Bring

- **Master’s** degree with **3+ years** of relevant industry experience, or **Ph.D.** with 1+ years of experience in applied ML / data science roles; background in Computer Science, Statistics, Mathematics, or related quantitative fields preferred.
- Strong proficiency in **Python** (preferred) or **Scala**, including experience with ML libraries such as **scikit-learn**, **XGBoost**, **TensorFlow** or **PyTorch**.
- Extensive experience with **Spark** or **PySpark** and distributed data systems (e.g., **AWS EMR**, **Databricks**) working on very large, messy datasets.
- Deep understanding of supervised and unsupervised learning, feature engineering, model evaluation, and experiment design (**A/B testing**, holdout strategies, stratification).
- Experience developing production-quality data pipelines and automated workflows using **Airflow** or similar orchestration tools.
- Practical familiarity with graph databases and/or graph frameworks (**Neo4j**, **AWS Neptune**, **GraphFrames**, **DGL**, **PyTorch Geometric**) and graph algorithms for clustering, link prediction, and community detection is strongly preferred.
- Solid **SQL** skills and experience working with large-scale analytical data stores.
- Experience in at least one of: identity verification, fraud detection, credit risk, or adjacent high‑stakes domains is a plus.
**Apply:** https://hotfix.jobs/jobs/senior-data-scientist-big-data-r-d-identity-graph-kyc-at-socure-7c47bd1f-5c10-4996-b613-5a2bf72af886
**Canonical:** https://hotfix.jobs/jobs/senior-data-scientist-big-data-r-d-identity-graph-kyc-at-socure-7c47bd1f-5c10-4996-b613-5a2bf72af886