# Senior Backend Engineer, Data Modeling and Ingestion Platform
**Company:** [Udio](https://hotfix.jobs/companies/udio)
**Location:** New York, NY
**Salary:** $180K-$220K
**Skills:** Python, BigQuery, Apache Beam, Dataflow, Ray, Tfrecords, JAX, GCP, Spark, Flink
**Posted:** 2026-01-21
> Leads unification of large heterogeneous datasets for generative audio models by building scalable ingestion, entity resolution, deduplication, and enrichment systems. Collaborates with ML researchers using tools like BigQuery, Dataflow, Ray, and prepares ML-ready data formats.
## Job Description
## What You'll Do

- Build high-throughput bulk ingestion workflows to integrate datasets from multiple external providers.
- Design and implement scalable entity-resolution solutions, including record linking, deduplication, clustering, and conflict arbitration.
- Create and refine matching logic, decision rules, and similarity functions to align datasets with high accuracy and strong coverage.
- Define and track data quality indicators, such as overlap metrics, match precision/recall, duplicate rates, and completeness.
- Prepare training-ready datasets in formats such as **TFRecords**, and structure data to meet ML research requirements.
- Develop processing components using **Dataflow (Beam)** and manage large analytical workloads in **BigQuery**.
- Leverage frameworks like **Ray** to accelerate large-scale experiments, feature extraction, and research-oriented data preparation.
- Collaborate with ML researchers to anticipate downstream requirements and evolve linkage strategies as new sources and use cases emerge.

## What We're Looking For

- Experience working with large, heterogeneous datasets from multiple providers or domains.
- Strong background in entity resolution, deduplication, data unification, or related large-scale data integration techniques.
- Proficiency in **Python**, with an emphasis on efficient, scalable data processing.
- Experience with **BigQuery**, **Google Dataflow/Apache Beam**, or similar batch-processing frameworks.
- Familiarity with data validation, normalization, reconciliation, and building consistent views across diverse data sources.
- Ability to craft well-structured matching and decision strategies that balance accuracy, completeness, and computational efficiency.
- Comfortable iterating quickly on pragmatic solutions, balancing correctness with time-to-delivery.
- Clear communication skills and the ability to collaborate closely with ML and research teams.

## Nice to Have

- Knowledge of architecting **Google Cloud Platform** systems at scale
- Experience with distributed compute frameworks such as **Ray**, **Spark**, or **Flink**.
- Understanding of **JAX**-based ML pipelines, multihost training setups, or large-scale data preparation for accelerator-backed workflows.
- Familiarity with **TFRecords** or other high-volume training data formats.
- Exposure to ranking, clustering, or statistical similarity modeling.
- Experience with **Go**, **NextJS**, and/or **React Native** to contribute to full-stack development

## Compensation

Base salary range: **$180,000 - $220,000**, plus equity and benefits.
**Apply:** https://hotfix.jobs/jobs/senior-backend-engineer-data-modeling-and-ingestion-platform-at-udio-5a12ecea-f539-4f70-bd0e-9a07278f83e8
**Canonical:** https://hotfix.jobs/jobs/senior-backend-engineer-data-modeling-and-ingestion-platform-at-udio-5a12ecea-f539-4f70-bd0e-9a07278f83e8