Data Engineer

Builds and optimizes scalable data pipelines, storage, and OLAP databases for ML training, analytics, and product features. Requires 5+ years in data engineering, proficiency in Python/SQL/cloud platforms, and distributed systems experience.

185k – 217kSan Francisco, CAData EngineeringHybrid5+ YOE

Apply

About the role

What You’ll Do

Build and maintain scalable data services, pipelines and storage solutions for the feedback of unstructured application data for ML training and evaluation purposes.
Build and manage OLAP databases, ELTs and general data tooling for analytics, business decisions and products features.
Work closely with a team of frontend and backend engineers, product managers, and analysts.
Optimize data infrastructure to enhance the throughput, latency and reliability of the data system.
Investigate and correct issues identified through data operations monitors, tools, and reports.
Designs data integrations and data quality framework.

What You’ll Bring

5+ years of experience in Data Engineering or Backend Engineering with a focus on data systems.
Proficient in at least one general purpose programming language (e.g., Python, Java, Scala) and SQL (any variant)
Proficiency with at least one modern cloud provider (GCP, AWS, Azure) and accompanying data services
Experience in building systems that manage the ingest, transformation, and management of both structured and unstructured data types
Deep knowledge of modern data infrastructure best practices
Experience with distributed systems and different distributed processing frameworks
Experience with Terraform, Kubernetes, and containerization technologies.
Familiarity with the deploying ML models at scale a bonus
Experience in building data products that are well-modeled, documented and easy to understand and maintain.
Ability to prioritize amidst changing priorities in a fast moving environment

Skills

PythonJavaScalaSQLGCPAWSAzureTerraformKubernetesOlapELTDistributed Systems

Similar roles

Data Engineering jobs

OpenAI

Software Engineer, Data Infrastructure

Builds and operates scalable data infrastructure including compute fleets, storage systems, and streaming platforms to support OpenAI's AI products, research, and analytics. Requires 4+ years in data or infrastructure engineering with expertise in Spark, Kafka, and distributed systems.

185k – 385kSan Francisco, CAData EngineeringHybrid4+ YOESparkKafka

Scale AI

Software Engineer, Data Infrastructure

Architect and build foundational data infrastructure for massive simulation outputs. Design novel data models and high-throughput pipelines to feed LLMs with structured context from complex, state-based environments.

186k – 233kNew York, NY +1Data EngineeringOn-site5+ YOEGoC++

Actively AI

Data Engineer

Own data and analytics end-to-end: architect internal systems, build metrics/dashboards, and translate customer and product signals into structured inputs for AI agents.

180k – 210kNew York, NYData EngineeringOn-siteSQLdbt

Baseten

Data Engineer

Builds and scales internal data platform by designing data models, pipelines, and analytics infrastructure to transform raw product/business data into reliable datasets for company-wide decision-making. Partners with stakeholders across Product, Engineering, Finance, Marketing, and Sales.

180k – 250kSan Francisco, CA +1Data EngineeringHybridKafkaAirflow

xAI

Software Engineer - Data Platform

Builds and operates petabyte-scale data platform infrastructure using Kafka, Spark, Flink, and Trino to power real-time ML pipelines and analytics. Requires expertise in distributed systems, stream processing, and systems languages like Rust, Go, or Scala.

180k – 440kPalo Alto, CA +1Data EngineeringHybridGoHdfs