Skip to content

Senior Data Scientist - International eKYC, Identity Graph

Leads development of ML and graph-based systems for international eKYC and identity graph, focusing on entity resolution, anomaly detection, and regulatory compliance across global markets. Requires 6+ years ML experience, expertise in Python, SQL, Spark, and graph technologies like Neo4j and AWS Neptune.

140k – 170kCarson City, NVML EngineeringRemote6+ YOE

About the role

Responsibilities

International eKYC Modeling & Entity Resolution

  • Lead the design, development, and deployment of ML and graph-based algorithms for international entity resolution, identity trust scoring, and anomaly detection across heterogeneous, country‑specific datasets.
  • Architect reusable matching and linking frameworks that work across multiple ID schemes (e.g., national ID numbers, passports, voter IDs, mobile accounts, bank accounts) and local name/address conventions.
  • Develop probabilistic and rule‑augmented models that handle noisy, sparse, or partially labeled international data while maintaining explainability and regulatory defensibility.

Global Identity Graph & Data Quality

  • Define and evolve the international extension of Socure’s identity graph: schema design, linkage strategies, quality tiers, and confidence scoring that can be leveraged by multiple products (Verify, KYC, watchlists, fraud).
  • Design and implement robust data quality and monitoring frameworks for international identity data (coverage, stability, drift, regional bias, label quality) and integrate them into modeling and production monitoring workflows.
  • Build scalable approaches for handling linguistic and cultural variation (e.g., transliteration, multi‑script names, address normalization, local naming patterns) in the identity graph and matching pipelines.

Evaluation, Experimentation, and Model Governance

  • Own experimentation strategy for major international eKYC initiatives.
  • Design offline evaluations and online A/B tests that reflect local ground truth constraints and data sparsity.
  • Define success metrics that balance approval rates, fraud capture, and regulatory/operational constraints per market.
  • Analyze lift, stability, and fairness trade‑offs and drive go/no‑go decisions with Product and Engineering.
  • Define and maintain evaluation frameworks specific to international eKYC (e.g., regional coverage maps, cross‑border identity leakage, local demographic impact, regulatory thresholds).
  • Contribute to model governance documentation and support responses to regulators and large enterprise customers regarding model logic, data provenance, fairness, and monitoring for international markets.

Data Source Strategy & Vendor Evaluation (International)

  • Lead the evaluation and integration of international data vendors (e.g., bureaus, telcos, public records, alternative data).
  • Design benchmarking methodologies for signal quality, incremental value, stability, and fairness by country/segment.
  • Quantify ROI and trade‑offs across multiple vendors and data types; provide clear recommendations that influence product and commercial decisions.
  • Partner with Data Acquisition, Legal, and Compliance to ensure that data usage and modeling approaches meet regional regulatory requirements (e.g., GDPR and local privacy/AML/KYC rules).

Technical Leadership & Cross‑Functional Partnership

  • Collaborate with engineering leaders to design scalable, reliable international data and model pipelines using Spark/PySpark, AWS (EMR, S3, SageMaker, Neptune), and modern MLOps workflows.
  • Act as a subject‑matter expert on international identity, eKYC regulations, and cross‑border data limitations for internal stakeholders, supporting complex customer questions and strategic roadmap discussions.
  • Mentor Data Scientists and Senior Data Scientists on best practices for international modeling: handling low‑label regimes, domain adaptation, localization of thresholds/logic, and building reusable abstractions instead of one‑off country fixes.
  • Communicate strategy, progress, and results to senior leadership and cross‑functional partners through clear documents and presentations, framing complex technical work in terms of business impact, regional risk, and regulatory trade‑offs.

Requirements

Education & Experience

  • Master’s or Ph.D. in Computer Science, Data Science, Machine Learning, Statistics, Mathematics, or a related field, or equivalent practical experience.
  • 6+ years of hands-on applied ML / data science experience (4+ with Ph.D.), including owning production models and pipelines in high‑stakes domains (fraud, risk, identity, payments, credit, or similar).
  • Significant prior work on international or multi‑region products is strongly preferred (e.g., cross‑country KYC, credit risk, payments, or compliance systems).

Technical Skills

  • Expert‑level proficiency in Python and SQL, with extensive experience in distributed data processing (Spark/PySpark, Databricks or similar) on very large datasets.
  • Deep experience designing, training, and deploying models for classification, ranking, anomaly detection, and/or graph learning, including:
    • Feature engineering for noisy/heterogeneous identity data.
    • Robust evaluation under label sparsity and feedback delays.
    • Calibration and thresholding tailored to regional risk and regulatory constraints.
  • Proven expertise with graph technologies (e.g., Neo4j, AWS Neptune, GraphFrames, DGL, PyTorch Geometric) and graph algorithms (entity resolution, link prediction, community detection, label propagation) at scale.

Skills

PythonSQLSparkPysparkAws EmrAws S3Aws SagemakerAws NeptuneNeo4JGraphframesPytorch GeometricDglMachine LearningEntity ResolutionGraph Algorithms

Similar roles

ML Engineering jobs

Senior Data Scientist - Big Data R&D, Identity Graph & KYC

Leads design and deployment of ML, statistical, and graph algorithms for entity resolution, identity graphs, and KYC on large-scale PII datasets. Builds scalable pipelines in Spark/PySpark on AWS/Databricks, runs experiments, and mentors juniors. Requires Master's +3 years or PhD experience.

140k – 170kCarson City, NVML EngineeringRemote3+ YOEAWSSQL

Senior Software Engineer, Agents

Build and scale AI agents automating healthcare workflows. Lead complex feature development using JavaScript/TypeScript/Node.js, LLMs, and DevOps tools. Requires 4+ years experience and CS degree.

140k – 230kMountain View, CAML EngineeringOn-site4+ YOERAGGpt

Senior AI Engineer

Senior AI Engineer building internal AI-powered solutions end-to-end for GitLab's Sales, Marketing, and Support teams. Responsible for diagnosing problems, selecting models, designing agentic systems with guardrails, and shipping production solutions that improve organizational flow.

139k – 218kUnited StatesML EngineeringRemote5+ YOERAGLLMs

Senior Machine Learning Engineer

Senior ML Engineer focused on fine-tuning and deploying LLMs and generative AI features into Firefox, emphasizing privacy, latency, and user experience.

139k – 218kUnited StatesML EngineeringRemote4+ YOERayLLMs

Senior Machine Learning Engineer, AI Platform

Design, build, and operate Mozilla's AI platform for training, deploying, and serving ML models at scale. Requires 4-6 years experience building production ML systems with strong Python and GPU/cloud infrastructure skills.

139k – 218kUnited StatesML EngineeringRemote4+ YOECI/CDPython