Research Engineer, Data

Research Engineers build data systems and pipelines that power reliable AI workflows for enterprise customers. They design evaluation frameworks, develop data quality systems, and collaborate with researchers and engineers to turn frontier AI concepts into production-ready solutions.

150k – 250kSan Francisco, CANew York, NYData EngineeringHybrid

Apply

About the role

Key Responsibilities

Design and build data systems that power reliable AI workflows across enterprise environments
Develop pipelines for collecting, cleaning, transforming, labeling, and evaluating domain-specific data used by AI systems
Create data quality frameworks that identify coverage gaps, ambiguity, drift, duplication, leakage, and other failure modes
Build tools and workflows that help teams turn raw customer data into usable context for retrieval, evaluation, reasoning, and execution
Partner with AI Researchers and AI Engineers to understand how data quality affects system behavior and production outcomes
Develop synthetic data, annotation, and feedback-loop strategies to improve system performance in areas where real-world data is sparse or noisy
Analyze customer workflows and datasets to determine what information AI systems need, where that information should come from, and how it should be represented
Communicate clearly with internal teams and customer stakeholders about data assumptions, limitations, risks, and tradeoffs

Requirements

Experience building data systems for AI: built data pipelines, evaluation datasets, labeling workflows, retrieval corpora, or similar systems that improve model or agent behavior
Strong data engineering fundamentals: write clean Python and SQL, understand data modeling and pipeline reliability, build systems that are maintainable under production constraints
Research-oriented builder: comfortable investigating how data quality, structure, and representation affect AI system performance
AI-native working style: use AI tools daily to accelerate coding, analysis, debugging, exploration, and workflow automation
Comfort with ambiguous data: reason through messy enterprise datasets, incomplete documentation, conflicting business definitions, and changing requirements
Bias towards measurement: prefer to make data quality and system behavior observable through concrete metrics, evaluations, and experiments
Customer environment readiness: work directly with customer teams to understand their data, ask precise questions, and explain tradeoffs clearly
Ownership mentality: take responsibility for whether the data layer enables the AI system to deliver reliable value in production

Skills

PythonSQLData PipelinesData Quality FrameworksEvaluation DatasetsLabeling WorkflowsRetrieval CorporaSynthetic Data GenerationData ModelingAi System Evaluation

Similar roles

Data Engineering jobs

Confido Legal

Analytics Engineer

Build and maintain Confido's centralized data warehouse and analytics infrastructure. Design scalable data models, establish data standards, and enable self-service analytics across the organization.

150k – 190kNew York, NYData EngineeringOn-siteSQLdbt

Mochi Health

Data Manager

Mochi Health is seeking a Data Manager to lead their Data team, driving execution across analytics, data engineering, and operational data work. This role requires strong technical depth in SQL and Python, with a focus on data quality, reliability, and compliance.

150k – 220kSan Francisco, CAData EngineeringOn-site4+ YOESQLPython

HappyRobot

Data Operations Lead

Leads data operations for AI audio models, translating research needs into actionable specifications, managing end-to-end data acquisition and annotation with internal and vendor teams, and scaling labeling workflows and workforce.

150k – 200kSan Francisco, CAData EngineeringHybridData LabelingData Annotation

Joyful Health

Data Engineer

Build scalable data pipelines and integrations for an AI-powered financial operating system in healthcare. Requires 5+ years data engineering experience with Python, distributed systems, AI pipelines, and big data technologies; NYC-based hybrid role.

150k – 275kNew York, NYData EngineeringHybrid5+ YOESQLCI/CD

Sieve

Software Engineer

Builds and scales data pipelines for video AI datasets, owning end-to-end projects from sourcing to delivery with ML filters and dashboards. Requires strong Python skills, clean code practices, and customer communication.

150k – 300kSan Francisco, CAData EngineeringOn-siteGoPython