Data Engineer

240k – 280kPalo Alto, CAOnsite1+ YOEApr 24

Summary

Build and maintain production data pipelines for AI model training, ensuring high-quality data through cleaning, transformation, and evaluation. Requires 1+ years experience with Python, ML, and neural networks; bachelor's in STEM.

About the role

Responsibilities

Analyze the performance and impact of data used throughout the model training lifecycle
Investigate anomalous model behavior and rigorously identify the data issues that drive poor downstream performance
Design, build, and improve the data cleaning, transformation, and quality-control steps required to produce high-quality training data
Research, evaluate, and develop frontier methods for improving data quality and effectiveness in AI model development
Apply statistical techniques and empirical analysis to make informed, data-driven decisions about dataset quality and model outcomes
Partner across teams to identify where data needs exist and define the highest-impact opportunities for new data acquisition and improvement
Build and maintain production-grade data pipelines, tooling, and software systems that ingest, process, validate, and deliver data for training
Develop metrics, evaluation frameworks, and monitoring systems to assess how data quality influences model behavior at scale
Fuse data from multiple sources into reliable, usable datasets for research and production model training
Create shared datasets, tooling, and internal data products that enable other teams to analyze, debug, and improve model performance

Basic Qualifications

Bachelor’s degree in computer science, data science, physics, mathematics, or a STEM discipline
1+ years of data/software engineering experience (internship experience is applicable)
Experience in implementing or analyzing language models or neural networks

Preferred Skills and Experience

Professional experience in analytics, data science, machine learning, or data engineering
Experience building and operating production data pipelines for neural network or large-scale machine learning workloads
Strong experience with Python and the broader ecosystem of libraries and tools used in modern machine learning and data development
Experience working with Parquet or similar columnar storage formats in large-scale data systems
Familiarity with Kubernetes and distributed production environments
Experience developing predictive models and machine learning pipelines, including clustering, forecasting, anomaly detection, or related techniques
Experience working with very large-scale datasets, including terabyte- to petabyte-scale data systems
Strong statistical intuition and the ability to use quantitative analysis to guide technical and product decision, including familiarity of scaling ladder design studies
Ability to operate effectively in a dynamic environment with evolving priorities, changing requirements, and fast-moving technical challenges
Demonstrated ability to take ownership of ambiguous problems, drive projects independently, and develop new expertise where needed

Compensation and Benefits

$240,000 - $280,000 USD base salary
Equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks

Skills

PythonKubernetesParquetdata pipelinesmachine learningneural networkslanguage modelsstatisticsdistributed systemslarge-scale datasets

Similar roles at this salary range

All Data Engineering jobs →

Honor

Jun 8

Staff Data Platform Engineer

Staff Data Platform Engineer building and leading AWS-native data platform architecture, orchestration, governance, and AI-readiness for analytics and ML workloads. Requires 8-10+ years experience with AWS data systems and strong technical leadership.

194k – 220kUnited StatesData EngineeringRemotedbtPython

Justworks

Jun 8

Manager, Data Engineering

Lead and mentor a team of data engineers building scalable data pipelines and platform infrastructure. Hands-on coding, operational excellence, and cross-functional collaboration with analytics, data science, and business teams.

205k – 262kNew York, NYData EngineeringHybridSQLAWS

Nuance Labs

Jun 5

Member of Technical Staff — ML Data Infra

Build and operate large-scale multimodal data pipelines for AI avatar model training. Design production-grade systems for petabyte-scale video, audio, and text data.

200k – 300kSeattle, WAData EngineeringOn-siteRayDVC

Jump

Jun 4

Data Platform Lead

Own end-to-end data platform strategy and lead the data engineering team. Build scalable multi-tenant infrastructure, AI-on-data capabilities, and productized integrations for sports analytics clients.

210k – 210kLos Angeles, CAData EngineeringRemotedbtAWS

CodeRabbit

Jun 4

Staff Analytics Engineer

CodeRabbit is seeking a Staff Analytics Engineer to build and own their BigQuery and dbt data foundation. This role involves architecting the data warehouse, defining key metrics, building revenue models, and developing GTM intelligence layers.

240k – 250kSan Francisco, CA +1Data EngineeringHybriddbtGCP

Apply