Skip to content

Senior Data Scientist - Big Data R&D, Identity Graph & KYC

Leads design and deployment of ML, statistical, and graph algorithms for entity resolution, identity graphs, and KYC on large-scale PII datasets. Builds scalable pipelines in Spark/PySpark on AWS/Databricks, runs experiments, and mentors juniors. Requires Master's +3 years or PhD experience.

140k – 170kCarson City, NVML EngineeringRemote3+ YOE

About the role

What You'll Do

  • Own the design, development, and evaluation of machine learning, statistical, and graph-based algorithms for entity-resolution, identity trust scoring, and anomaly detection on massive datasets.
  • Architect and optimize graph-based identity representations (identity graph structure, linkage rules, clustering) to improve match rates, reduce false positives/negatives, and support downstream fraud and KYC models.
  • Build and maintain scalable data pipelines and feature stores in Spark/PySpark (or Scala), including data normalization, deduplication, and feature computation across large PII datasets in AWS/Databricks environments.
  • Lead A/B tests and offline/online experimentation for new models, features, and data sources; define success metrics, design experiments, and ensure rigorous validation before rollout.
  • Evaluate new internal and external data sources: explore signal quality, design backtests, quantify incremental value, and provide clear recommendations on vendor selection and integration.
  • Partner closely with product managers and engineers to translate ambiguous business and regulatory requirements (e.g., KYC coverage, watchlist matching) into concrete modeling and data roadmaps.
  • Provide deep analytical support to Socure’s compliance and regulatory product suite, including investigative analyses, root‑cause analysis for anomalies, and clear narratives for internal and external stakeholders.
  • Contribute to model governance and documentation: clearly explain model logic, data dependencies, limitations, and monitoring plans to internal risk/compliance stakeholders.
  • Mentor junior data scientists and engineers on best practices in data exploration, feature engineering, experimentation, and code quality.
  • Communicate complex technical concepts and trade‑offs in a concise, structured way to both technical and non‑technical audiences (e.g., product reviews, customer meetings, internal briefings).

What You Bring

  • Master’s degree with 3+ years of relevant industry experience, or Ph.D. with 1+ years of experience in applied ML / data science roles; background in Computer Science, Statistics, Mathematics, or related quantitative fields preferred.
  • Strong proficiency in Python (preferred) or Scala, including experience with ML libraries such as scikit-learn, XGBoost, TensorFlow or PyTorch.
  • Extensive experience with Spark or PySpark and distributed data systems (e.g., AWS EMR, Databricks) working on very large, messy datasets.
  • Deep understanding of supervised and unsupervised learning, feature engineering, model evaluation, and experiment design (A/B testing, holdout strategies, stratification).
  • Experience developing production-quality data pipelines and automated workflows using Airflow or similar orchestration tools.
  • Practical familiarity with graph databases and/or graph frameworks (Neo4j, AWS Neptune, GraphFrames, DGL, PyTorch Geometric) and graph algorithms for clustering, link prediction, and community detection is strongly preferred.
  • Solid SQL skills and experience working with large-scale analytical data stores.
  • Experience in at least one of: identity verification, fraud detection, credit risk, or adjacent high‑stakes domains is a plus.

Skills

PythonPysparkSparkAWSDatabricksscikit-learnXgboostTensorFlowPyTorchSQLAirflowNeo4JGraphframesA/B TestingMachine Learning

Similar roles

ML Engineering jobs

Senior Data Scientist - International eKYC, Identity Graph

Leads development of ML and graph-based systems for international eKYC and identity graph, focusing on entity resolution, anomaly detection, and regulatory compliance across global markets. Requires 6+ years ML experience, expertise in Python, SQL, Spark, and graph technologies like Neo4j and AWS Neptune.

140k – 170kCarson City, NVML EngineeringRemote6+ YOESQLDgl

Senior Software Engineer, Agents

Build and scale AI agents automating healthcare workflows. Lead complex feature development using JavaScript/TypeScript/Node.js, LLMs, and DevOps tools. Requires 4+ years experience and CS degree.

140k – 230kMountain View, CAML EngineeringOn-site4+ YOERAGGpt

Senior AI Engineer

Senior AI Engineer building internal AI-powered solutions end-to-end for GitLab's Sales, Marketing, and Support teams. Responsible for diagnosing problems, selecting models, designing agentic systems with guardrails, and shipping production solutions that improve organizational flow.

139k – 218kUnited StatesML EngineeringRemote5+ YOERAGLLMs

Senior Machine Learning Engineer

Senior ML Engineer focused on fine-tuning and deploying LLMs and generative AI features into Firefox, emphasizing privacy, latency, and user experience.

139k – 218kUnited StatesML EngineeringRemote4+ YOERayLLMs

Senior Machine Learning Engineer, AI Platform

Design, build, and operate Mozilla's AI platform for training, deploying, and serving ML models at scale. Requires 4-6 years experience building production ML systems with strong Python and GPU/cloud infrastructure skills.

139k – 218kUnited StatesML EngineeringRemote4+ YOECI/CDPython