Senior Backend Engineer, Data Modeling and Ingestion Platform

Leads unification of large heterogeneous datasets for generative audio models by building scalable ingestion, entity resolution, deduplication, and enrichment systems. Collaborates with ML researchers using tools like BigQuery, Dataflow, Ray, and prepares ML-ready data formats.

180k – 220kNew York, NYBackend EngineeringOnsite

Apply

About the role

What You'll Do

Build high-throughput bulk ingestion workflows to integrate datasets from multiple external providers.
Design and implement scalable entity-resolution solutions, including record linking, deduplication, clustering, and conflict arbitration.
Create and refine matching logic, decision rules, and similarity functions to align datasets with high accuracy and strong coverage.
Define and track data quality indicators, such as overlap metrics, match precision/recall, duplicate rates, and completeness.
Prepare training-ready datasets in formats such as TFRecords, and structure data to meet ML research requirements.
Develop processing components using Dataflow (Beam) and manage large analytical workloads in BigQuery.
Leverage frameworks like Ray to accelerate large-scale experiments, feature extraction, and research-oriented data preparation.
Collaborate with ML researchers to anticipate downstream requirements and evolve linkage strategies as new sources and use cases emerge.

What We're Looking For

Experience working with large, heterogeneous datasets from multiple providers or domains.
Strong background in entity resolution, deduplication, data unification, or related large-scale data integration techniques.
Proficiency in Python, with an emphasis on efficient, scalable data processing.
Experience with BigQuery, Google Dataflow/Apache Beam, or similar batch-processing frameworks.
Familiarity with data validation, normalization, reconciliation, and building consistent views across diverse data sources.
Ability to craft well-structured matching and decision strategies that balance accuracy, completeness, and computational efficiency.
Comfortable iterating quickly on pragmatic solutions, balancing correctness with time-to-delivery.
Clear communication skills and the ability to collaborate closely with ML and research teams.

Nice to Have

Knowledge of architecting Google Cloud Platform systems at scale
Experience with distributed compute frameworks such as Ray, Spark, or Flink.
Understanding of JAX-based ML pipelines, multihost training setups, or large-scale data preparation for accelerator-backed workflows.
Familiarity with TFRecords or other high-volume training data formats.
Exposure to ranking, clustering, or statistical similarity modeling.
Experience with Go, NextJS, and/or React Native to contribute to full-stack development

Compensation

Base salary range: $180,000 - $220,000, plus equity and benefits.

Skills

PythonBigQueryApache BeamDataflowRayTfrecordsJAXGCPSparkFlink

Similar roles

Backend Engineering jobs

Attentive

Senior Software Engineer, Strategic Integrations

Senior engineer leading platform quality, legacy migration, and observability for enterprise partner integrations. Requires strong backend experience, third-party API integration at scale, and incremental migration expertise.

180k – 200kUnited StatesBackend EngineeringRemote5+ YOEJavaKotlin

Maybern

Senior Software Engineer

Seasoned product-focused engineer to architect and build core backend systems for a private fund management platform. Requires strong technical skills in Python/Django, databases, and distributed systems.

180k – 230kNew York, NYBackend EngineeringOn-site5+ YOEAWSPython

Hearth

Senior Software Engineer, Backend

Backend engineer building Go microservices and migrating from Rails monolith to power AI workflows, payments, and integrations for 15k+ contractors. Requires 3+ years experience, Go or strong polyglot skills, and payments/integration experience.

180k – 198kNew York, NYBackend EngineeringHybrid3+ YOEGoAWS

Onos Health

Senior Software Engineer

Senior Backend Engineer to architect AI/ML workflows processing healthcare data, lead platform integrations, and own end-to-end feature development. Requires 5+ years experience and prior tech lead experience.

180k – 240kSan Francisco, CABackend EngineeringHybrid5+ YOEAWSPython

Attentive

Senior Software Engineer, Identity

Senior Software Engineer on Identity team owns and improves high-scale systems processing billions of events daily for accurate customer identity resolution, session enrichment, and matching algorithms. Collaborates with product, data science, and analytics; requires 5+ years experience with real-time data systems.

180k – 210kNew York, NY +1Backend EngineeringRemote5+ YOEAWSGCP