Skip to content

Data Engineer

Builds and optimizes scalable data pipelines, storage, and OLAP databases for ML training, analytics, and product features. Requires 5+ years in data engineering, proficiency in Python/SQL/cloud platforms, and distributed systems experience.

185k – 217kSan Francisco, CAData EngineeringHybrid5+ YOE

About the role

What You’ll Do

  • Build and maintain scalable data services, pipelines and storage solutions for the feedback of unstructured application data for ML training and evaluation purposes.
  • Build and manage OLAP databases, ELTs and general data tooling for analytics, business decisions and products features.
  • Work closely with a team of frontend and backend engineers, product managers, and analysts.
  • Optimize data infrastructure to enhance the throughput, latency and reliability of the data system.
  • Investigate and correct issues identified through data operations monitors, tools, and reports.
  • Designs data integrations and data quality framework.

What You’ll Bring

  • 5+ years of experience in Data Engineering or Backend Engineering with a focus on data systems.
  • Proficient in at least one general purpose programming language (e.g., Python, Java, Scala) and SQL (any variant)
  • Proficiency with at least one modern cloud provider (GCP, AWS, Azure) and accompanying data services
  • Experience in building systems that manage the ingest, transformation, and management of both structured and unstructured data types
  • Deep knowledge of modern data infrastructure best practices
  • Experience with distributed systems and different distributed processing frameworks
  • Experience with Terraform, Kubernetes, and containerization technologies.
  • Familiarity with the deploying ML models at scale a bonus
  • Experience in building data products that are well-modeled, documented and easy to understand and maintain.
  • Ability to prioritize amidst changing priorities in a fast moving environment

Skills

PythonJavaScalaSQLGCPAWSAzureTerraformKubernetesOlapELTDistributed Systems

Software Engineer, Data Infrastructure

Builds and operates scalable data infrastructure including compute fleets, storage systems, and streaming platforms to support OpenAI's AI products, research, and analytics. Requires 4+ years in data or infrastructure engineering with expertise in Spark, Kafka, and distributed systems.

185k – 385kSan Francisco, CAData EngineeringHybrid4+ YOESparkKafka

Software Engineer, Data Infrastructure

Architect and build foundational data infrastructure for massive simulation outputs. Design novel data models and high-throughput pipelines to feed LLMs with structured context from complex, state-based environments.

186k – 233kNew York, NY +1Data EngineeringOn-site5+ YOEGoC++

Data Engineer

Own data and analytics end-to-end: architect internal systems, build metrics/dashboards, and translate customer and product signals into structured inputs for AI agents.

180k – 210kNew York, NYData EngineeringOn-siteSQLdbt

Data Engineer

Builds and scales internal data platform by designing data models, pipelines, and analytics infrastructure to transform raw product/business data into reliable datasets for company-wide decision-making. Partners with stakeholders across Product, Engineering, Finance, Marketing, and Sales.

180k – 250kSan Francisco, CA +1Data EngineeringHybridKafkaAirflow

Software Engineer - Data Platform

Builds and operates petabyte-scale data platform infrastructure using Kafka, Spark, Flink, and Trino to power real-time ML pipelines and analytics. Requires expertise in distributed systems, stream processing, and systems languages like Rust, Go, or Scala.

180k – 440kPalo Alto, CA +1Data EngineeringHybridGoHdfs