Skip to content

Data Scientist II - Big Data R&D, Identity Graph & KYC

Develop graph-based algorithms, entity-resolution systems, and data pipelines on massive PII datasets to power KYC and compliance products. Requires Master's/PhD with 2+ years experience, Python/SQL proficiency, Spark, and ML libraries.

140k – 170kCarson City, NVData EngineeringRemote2+ YOE

About the role

What You'll Do

  • Contribute to the design and implementation of machine learning, data mining, statistical, and graph-based algorithms to analyze very large datasets for identity verification and anomaly detection.
  • Analyze large datasets to help develop and refine entity-resolution and identity-matching algorithms that drive Socure’s KYC and compliance solutions.
  • Build and maintain components of data-processing pipelines (ETL, feature generation, normalization) using tools such as Spark/PySpark and AWS (e.g., EMR, S3).
  • Support senior data scientists with feature engineering, data exploration, error analysis, and A/B test setup for new models and signals.
  • Help evaluate new third‑party and internal data sources: profile data quality, design offline experiments, and summarize impact on coverage and model performance.
  • Implement and maintain SQL and Python/R code for data extraction, transformation, and validation; contribute to code reviews and basic testing.
  • Provide analytical support to compliance and regulatory product teams, including ad hoc investigations, simple dashboards, and data deep dives.
  • Communicate findings in a clear, structured way to peers and cross‑functional partners (Product, Engineering, Client Analysis), focusing on key insights and trade‑offs.
  • Work effectively in a fast‑paced, cross‑functional environment; demonstrate ownership of well-scoped tasks and follow through to completion.

What You Bring

  • Master’s degree with 2+ years of experience, or Ph.D. with 1+ years of experience in a data science or analytics role, or equivalent practical experience.
  • Proficiency in at least one general-purpose programming language used in data science (Python, or Scala).
  • Solid experience writing and optimizing SQL for large datasets; comfort working in data lake / warehouse environments.
  • Hands‑on experience with Spark or PySpark and common ML libraries (e.g., scikit-learn, XGBoost, TensorFlow/PyTorch a plus).
  • Familiarity with UNIX environments and the AWS ecosystem (e.g., EMR, S3); Databricks experience is a plus.
  • Working knowledge of supervised/unsupervised ML and basic statistics (similarity measures, clustering, evaluation metrics).
  • Exposure to graph techniques or graph databases (Neo4j, AWS Neptune, GraphFrames) is a strong plus.
  • Bonus: experience with Elasticsearch or DynamoDB; workflow tools such as Airflow for automating data pipelines.

Skills

PythonSQLSparkPysparkscikit-learnXgboostAWSEmrS3DatabricksNeo4JGraphframesTensorFlowPyTorchAirflow

Data Science Engineer, Analytics

Build data pipelines, models, dashboards, and analyses to support product and business decision-making. Requires 2+ years of Python/SQL experience with data modeling, ETL tools, and AWS.

145k – 160kSan Diego, CAData EngineeringRemote2+ YOESQLdbt

Software Engineer, Data Migration & Code Generation

Develops backend systems for MongoDB data migration and code generation tools using Java, Kafka, and Debezium. Requires 2+ years experience in distributed systems, streaming platforms, and strong CS fundamentals.

135k – 203kCalifornia +7Data EngineeringRemote2+ YOEJavaRust

Data Engineer

As a Data Engineer, you will build and maintain data pipelines, dbt models, and infrastructure on AWS and Snowflake. You will partner with BI/Analytics Engineering, take operational responsibility, and mentor junior team members.

130k – 145kUnited StatesData EngineeringRemote2+ YOESQLAWS

Software Engineer II, Customer Lifecycle Engineering

Software Engineer building data models, pipelines, and systems for measuring, monetizing, and optimizing customer lifecycle at an AI marketing platform.

130k – 170kUnited StatesData EngineeringRemote2+ YOESQLETL

Analytics Engineer

Build and maintain production data models and pipelines for Coinbase's Compliance Data Mart, ensuring regulatory-grade data quality and supporting audits and exams.

152k – 179kUnited StatesData EngineeringRemote2+ YOESQLdbt