# Senior Data Scientist - International eKYC, Identity Graph
**Company:** [Socure](https://hotfix.jobs/companies/socure)
**Location:** Remote
**Salary:** $140K-$170K
**Experience:** 6+ years
**Skills:** Python, SQL, Spark, Pyspark, Aws Emr, Aws S3, Aws Sagemaker, Aws Neptune, Neo4J, Graphframes, Pytorch Geometric, Dgl, Machine Learning, Entity Resolution, Graph Algorithms
**Posted:** 2026-04-23
> Leads development of ML and graph-based systems for international eKYC and identity graph, focusing on entity resolution, anomaly detection, and regulatory compliance across global markets. Requires 6+ years ML experience, expertise in Python, SQL, Spark, and graph technologies like Neo4j and AWS Neptune.
## Job Description
## Responsibilities

### International eKYC Modeling & Entity Resolution
- Lead the design, development, and deployment of ML and graph-based algorithms for international entity resolution, identity trust scoring, and anomaly detection across heterogeneous, country‑specific datasets.
- Architect reusable matching and linking frameworks that work across multiple ID schemes (e.g., national ID numbers, passports, voter IDs, mobile accounts, bank accounts) and local name/address conventions.
- Develop probabilistic and rule‑augmented models that handle noisy, sparse, or partially labeled international data while maintaining explainability and regulatory defensibility.

### Global Identity Graph & Data Quality
- Define and evolve the international extension of Socure’s identity graph: schema design, linkage strategies, quality tiers, and confidence scoring that can be leveraged by multiple products (Verify, KYC, watchlists, fraud).
- Design and implement robust data quality and monitoring frameworks for international identity data (coverage, stability, drift, regional bias, label quality) and integrate them into modeling and production monitoring workflows.
- Build scalable approaches for handling linguistic and cultural variation (e.g., transliteration, multi‑script names, address normalization, local naming patterns) in the identity graph and matching pipelines.

### Evaluation, Experimentation, and Model Governance
- Own experimentation strategy for major international eKYC initiatives.
- Design offline evaluations and online A/B tests that reflect local ground truth constraints and data sparsity.
- Define success metrics that balance approval rates, fraud capture, and regulatory/operational constraints per market.
- Analyze lift, stability, and fairness trade‑offs and drive go/no‑go decisions with Product and Engineering.
- Define and maintain evaluation frameworks specific to international eKYC (e.g., regional coverage maps, cross‑border identity leakage, local demographic impact, regulatory thresholds).
- Contribute to model governance documentation and support responses to regulators and large enterprise customers regarding model logic, data provenance, fairness, and monitoring for international markets.

### Data Source Strategy & Vendor Evaluation (International)
- Lead the evaluation and integration of international data vendors (e.g., bureaus, telcos, public records, alternative data).
- Design benchmarking methodologies for signal quality, incremental value, stability, and fairness by country/segment.
- Quantify ROI and trade‑offs across multiple vendors and data types; provide clear recommendations that influence product and commercial decisions.
- Partner with Data Acquisition, Legal, and Compliance to ensure that data usage and modeling approaches meet regional regulatory requirements (e.g., GDPR and local privacy/AML/KYC rules).

### Technical Leadership & Cross‑Functional Partnership
- Collaborate with engineering leaders to design scalable, reliable international data and model pipelines using Spark/PySpark, AWS (EMR, S3, SageMaker, Neptune), and modern MLOps workflows.
- Act as a subject‑matter expert on international identity, eKYC regulations, and cross‑border data limitations for internal stakeholders, supporting complex customer questions and strategic roadmap discussions.
- Mentor Data Scientists and Senior Data Scientists on best practices for international modeling: handling low‑label regimes, domain adaptation, localization of thresholds/logic, and building reusable abstractions instead of one‑off country fixes.
- Communicate strategy, progress, and results to senior leadership and cross‑functional partners through clear documents and presentations, framing complex technical work in terms of business impact, regional risk, and regulatory trade‑offs.

## Requirements

### Education & Experience
- Master’s or Ph.D. in Computer Science, Data Science, Machine Learning, Statistics, Mathematics, or a related field, or equivalent practical experience.
- 6+ years of hands-on applied ML / data science experience (4+ with Ph.D.), including owning production models and pipelines in high‑stakes domains (fraud, risk, identity, payments, credit, or similar).
- Significant prior work on international or multi‑region products is strongly preferred (e.g., cross‑country KYC, credit risk, payments, or compliance systems).

### Technical Skills
- Expert‑level proficiency in Python and SQL, with extensive experience in distributed data processing (Spark/PySpark, Databricks or similar) on very large datasets.
- Deep experience designing, training, and deploying models for classification, ranking, anomaly detection, and/or graph learning, including:
  - Feature engineering for noisy/heterogeneous identity data.
  - Robust evaluation under label sparsity and feedback delays.
  - Calibration and thresholding tailored to regional risk and regulatory constraints.
- Proven expertise with graph technologies (e.g., Neo4j, AWS Neptune, GraphFrames, DGL, PyTorch Geometric) and graph algorithms (entity resolution, link prediction, community detection, label propagation) at scale.
**Apply:** https://hotfix.jobs/jobs/senior-data-scientist-international-ekyc-identity-graph-at-socure-d287b4f4-bba2-407f-bdd8-62d5d9b67b5c
**Canonical:** https://hotfix.jobs/jobs/senior-data-scientist-international-ekyc-identity-graph-at-socure-d287b4f4-bba2-407f-bdd8-62d5d9b67b5c