Skip to content

Software Engineer, Data Infrastructure

180k – 300kRedwood City, CAOnsite
Summary

Builds and maintains scalable data processing pipelines and backend systems for a data curation platform that optimizes training data for ML models. Partners with researchers to integrate research capabilities, ensuring reliability and security for customer data.

About the role

What You'll Work On

  • Design, build and maintain highly scalable data processing solutions, while ensuring scalability, reliability, and security
  • Architect, build, and deploy the back-end systems and services that power our data curation platform
  • Partner with researchers and engineers to bring new features and research capabilities to our customers
  • Ensure that our systems are reliable, secure, and worthy of our customers' trust

About You

  • Have meaningful experience with leading and building production data systems to deliver on major product initiatives
  • You have built and managed highly scalable data processing solutions (e.g. Spark, Flink), data lakes or warehouses (e.g. Snowflake, Hive), authored queries (SQL), distributed storage systems (e.g., HDFS, S3), used workflow management (e.g. Airflow, Dagster), and have experience maintaining the infra that supports these
  • Proficiency in at least one programming language commonly used within Data Engineering, such as Python, Scala, or Java
  • Expertise with any of ETL schedulers such as Airflow, Dagster, or similar frameworks
  • Experience maintaining a high quality bar for design, correctness, and testing
  • Take pride in building and operating scalable, reliable, secure systems
  • Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed
  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done
  • You have experience being the technical lead of a Data Engineering / Platform / Infrastructure Team
  • Experience building ML/DL systems and/or data infrastructure that feeds into training large ML models

Compensation

  • Base salary ranges from $180,000 to $300,000
  • Comprehensive benefits: 100% covered health benefits (medical, vision, dental), 401(k) with 4% company match, unlimited PTO, annual wellness stipend ($2,000), learning stipend ($1,000), daily lunches/snacks, relocation assistance
Skills
SparkFlinkSnowflakeHiveSQLHDFSS3AirflowDagsterPythonScalaJavaETL
Similar roles at this salary range
All Data Engineering jobs →
Discord

Software Engineer, Data Platform

Build and maintain data infrastructure processing petabytes of data. Own end-to-end projects for data ingestion, transformation, and serving systems. Requires 3+ years of software engineering experience.

160k – 200kUnited StatesData EngineeringOn-site3+ YOEGoSQL
Twilio

Staff Analytics Engineer

Design and maintain a robust business data layer in dbt to enable trusted GTM sales analytics, reporting, data science, and AI capabilities. Requires 8+ years in analytics engineering with advanced SQL and dbt expertise.

156k – 229kUnited StatesData EngineeringRemote8+ YOESQLdbt
11x

Data Engineer

Own and extend customer data ingestion platform and large-scale pipelines powering AI workers. Build data lake, retrieval layer, and infrastructure for syncing, enriching, and querying customer data across CRMs and third-party systems.

170k – 200kUnited StatesData EngineeringRemote4+ YOEPythonAirbyte
Sesame

Data Engineer, Machine Learning

Build and maintain production data pipelines that prepare conversational, voice, and multimodal data for ML model training and evaluation. Partner closely with ML engineers to deliver clean, versioned datasets and enforce data quality and governance.

170k – 240kSan Francisco, CAData EngineeringOn-site5+ YOESQLETL
Okta

Staff Software Engineer, Data Platform

Staff Software Engineer building and scaling high-volume, low-latency distributed data platform services and analytics infrastructure using Java, Kinesis, Flink, Snowflake, and Kubernetes. Requires 8+ years experience and U.S. Person status for FedRAMP access.

194k – 267kSan Francisco, CAData EngineeringHybrid8+ YOEAWSJava