Software Engineer, Data Infrastructure

180k – 300kRedwood City, CAOnsiteMar 20

Summary

Builds and maintains scalable data processing pipelines and backend systems for a data curation platform that optimizes training data for ML models. Partners with researchers to integrate research capabilities, ensuring reliability and security for customer data.

About the role

What You'll Work On

Design, build and maintain highly scalable data processing solutions, while ensuring scalability, reliability, and security
Architect, build, and deploy the back-end systems and services that power our data curation platform
Partner with researchers and engineers to bring new features and research capabilities to our customers
Ensure that our systems are reliable, secure, and worthy of our customers' trust

About You

Have meaningful experience with leading and building production data systems to deliver on major product initiatives
You have built and managed highly scalable data processing solutions (e.g. Spark, Flink), data lakes or warehouses (e.g. Snowflake, Hive), authored queries (SQL), distributed storage systems (e.g., HDFS, S3), used workflow management (e.g. Airflow, Dagster), and have experience maintaining the infra that supports these
Proficiency in at least one programming language commonly used within Data Engineering, such as Python, Scala, or Java
Expertise with any of ETL schedulers such as Airflow, Dagster, or similar frameworks
Experience maintaining a high quality bar for design, correctness, and testing
Take pride in building and operating scalable, reliable, secure systems
Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed
Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done
You have experience being the technical lead of a Data Engineering / Platform / Infrastructure Team
Experience building ML/DL systems and/or data infrastructure that feeds into training large ML models

Compensation

Base salary ranges from $180,000 to $300,000
Comprehensive benefits: 100% covered health benefits (medical, vision, dental), 401(k) with 4% company match, unlimited PTO, annual wellness stipend ($2,000), learning stipend ($1,000), daily lunches/snacks, relocation assistance

Skills

SparkFlinkSnowflakeHiveSQLHDFSS3AirflowDagsterPythonScalaJavaETL

Similar roles at this salary range

All Data Engineering jobs →

Discord

Jun 18

Software Engineer, Data Platform

Build and maintain data infrastructure processing petabytes of data. Own end-to-end projects for data ingestion, transformation, and serving systems. Requires 3+ years of software engineering experience.

160k – 200kUnited StatesData EngineeringOn-site3+ YOEGoSQL

Twilio

Jun 17

Staff Analytics Engineer

Design and maintain a robust business data layer in dbt to enable trusted GTM sales analytics, reporting, data science, and AI capabilities. Requires 8+ years in analytics engineering with advanced SQL and dbt expertise.

156k – 229kUnited StatesData EngineeringRemote8+ YOESQLdbt

11x

Jun 16

Data Engineer

Own and extend customer data ingestion platform and large-scale pipelines powering AI workers. Build data lake, retrieval layer, and infrastructure for syncing, enriching, and querying customer data across CRMs and third-party systems.

170k – 200kUnited StatesData EngineeringRemote4+ YOEPythonAirbyte

Sesame

Jun 15

Data Engineer, Machine Learning

Build and maintain production data pipelines that prepare conversational, voice, and multimodal data for ML model training and evaluation. Partner closely with ML engineers to deliver clean, versioned datasets and enforce data quality and governance.

170k – 240kSan Francisco, CAData EngineeringOn-site5+ YOESQLETL

Okta

Jun 12

Staff Software Engineer, Data Platform

Staff Software Engineer building and scaling high-volume, low-latency distributed data platform services and analytics infrastructure using Java, Kinesis, Flink, Snowflake, and Kubernetes. Requires 8+ years experience and U.S. Person status for FedRAMP access.

194k – 267kSan Francisco, CAData EngineeringHybrid8+ YOEAWSJava

Apply