# Research Engineer, Data
**Company:** [Distyl AI](https://hotfix.jobs/companies/distyl-ai)
**Location:** San Francisco, CA, New York, NY
**Salary:** $150K-$250K
**Skills:** Python, SQL, Data Pipelines, Data Quality Frameworks, Evaluation Datasets, Labeling Workflows, Retrieval Corpora, Synthetic Data Generation, Data Modeling, Ai System Evaluation
**Posted:** 2026-06-22
> Research Engineers build data systems and pipelines that power reliable AI workflows for enterprise customers. They design evaluation frameworks, develop data quality systems, and collaborate with researchers and engineers to turn frontier AI concepts into production-ready solutions.
## Job Description
## Key Responsibilities
- Design and build data systems that power reliable AI workflows across enterprise environments
- Develop pipelines for collecting, cleaning, transforming, labeling, and evaluating domain-specific data used by AI systems
- Create data quality frameworks that identify coverage gaps, ambiguity, drift, duplication, leakage, and other failure modes
- Build tools and workflows that help teams turn raw customer data into usable context for retrieval, evaluation, reasoning, and execution
- Partner with AI Researchers and AI Engineers to understand how data quality affects system behavior and production outcomes
- Develop synthetic data, annotation, and feedback-loop strategies to improve system performance in areas where real-world data is sparse or noisy
- Analyze customer workflows and datasets to determine what information AI systems need, where that information should come from, and how it should be represented
- Communicate clearly with internal teams and customer stakeholders about data assumptions, limitations, risks, and tradeoffs

## Requirements
- Experience building data systems for AI: built data pipelines, evaluation datasets, labeling workflows, retrieval corpora, or similar systems that improve model or agent behavior
- Strong data engineering fundamentals: write clean Python and SQL, understand data modeling and pipeline reliability, build systems that are maintainable under production constraints
- Research-oriented builder: comfortable investigating how data quality, structure, and representation affect AI system performance
- AI-native working style: use AI tools daily to accelerate coding, analysis, debugging, exploration, and workflow automation
- Comfort with ambiguous data: reason through messy enterprise datasets, incomplete documentation, conflicting business definitions, and changing requirements
- Bias towards measurement: prefer to make data quality and system behavior observable through concrete metrics, evaluations, and experiments
- Customer environment readiness: work directly with customer teams to understand their data, ask precise questions, and explain tradeoffs clearly
- Ownership mentality: take responsibility for whether the data layer enables the AI system to deliver reliable value in production
**Apply:** https://hotfix.jobs/jobs/research-engineer-data-at-distyl-ai-f9814128-07b8-4b2f-bcfe-1832cf5e9e26
**Canonical:** https://hotfix.jobs/jobs/research-engineer-data-at-distyl-ai-f9814128-07b8-4b2f-bcfe-1832cf5e9e26