Skip to content

Data Engineer

Builds and scales internal data platform by designing data models, pipelines, and analytics infrastructure to transform raw product/business data into reliable datasets for company-wide decision-making. Partners with stakeholders across Product, Engineering, Finance, Marketing, and Sales.

180k – 250kSan Francisco, CANew York, NYData EngineeringHybrid

About the role

Responsibilities

  • Design and maintain core data models and semantic layers
  • Develop and orchestrate batch and streaming data pipelines using technologies such as Apache Beam, Kafka, Airflow, or similar frameworks
  • Analyze inference and infrastructure telemetry, including data from OpenTelemetry, Grafana, and other observability tools
  • Define and maintain company-wide metrics across product usage, performance, and customer lifecycle
  • Enable self-service analytics through agents and tools, with well-structured semantic layers and context
  • Ensure data reliability and quality through testing, documentation, and governance

Preferred Qualifications

  • Understanding of inference metrics such as latency, throughput, token usage, and model performance
  • Experience supporting B2B SaaS and/or consumption-based platforms
  • Application of forecasting and predictive modeling (e.g., ARIMA, Prophet) to business processes

Benefits

  • Competitive compensation, including meaningful equity
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Generous PTO policy including company wide Winter Break
  • Paid parental leave
  • Company-facilitated 401(k)

Skills

Apache BeamKafkaAirflowOpenTelemetryGrafanaData PipelinesSemantic LayersBatch ProcessingStreaming DataData Modeling

Data Engineer

Own data and analytics end-to-end: architect internal systems, build metrics/dashboards, and translate customer and product signals into structured inputs for AI agents.

180k – 210kNew York, NYData EngineeringOn-siteSQLdbt

Software Engineer - Data Platform

Builds and operates petabyte-scale data platform infrastructure using Kafka, Spark, Flink, and Trino to power real-time ML pipelines and analytics. Requires expertise in distributed systems, stream processing, and systems languages like Rust, Go, or Scala.

180k – 440kPalo Alto, CA +1Data EngineeringHybridGoHdfs

Software Engineer, Distributed Data Systems

Architects and builds massive-scale data infrastructure for web crawling, embedding model training, and real-time search, handling hundreds of petabytes. Requires expertise in lakehouse architectures, distributed processing pipelines, and streaming systems like Kafka and Flink.

180k – 350kSan Francisco, CAData EngineeringOn-siteRayHudi

Software Engineer, Data Infrastructure

Builds and maintains scalable data processing pipelines and backend systems for a data curation platform that optimizes training data for ML models. Partners with researchers to integrate research capabilities, ensuring reliability and security for customer data.

180k – 300kRedwood City, CAData EngineeringOn-siteS3SQL

Data Engineer

Designs and owns mission-critical data pipelines to enable decision-making across data science, growth, sales, marketing, and product teams. Requires 5+ years experience with scalable pipelines (preferably Airflow), Python, and advanced SQL.

180k – 222kSan Francisco, CA +4Data EngineeringOn-site5+ YOESQLPython