Skip to content

Data & ML Pipeline Software Engineer

Builds large-scale data processing pipelines and ML infrastructure to automate data curation, model training, and iteration for autonomous vehicles using real-world and simulation data. Requires 3-5 years experience in data/ML infra, Python, and frameworks like Spark/Airflow/Kafka.

125k – 222kSunnyvale, CAData EngineeringOnsite3+ YOE

About the role

Responsibilities

  • Build and maintain large-scale data processing pipelines (ETL) for ingesting and curating driving datasets.
  • Design and implement systems that automate data selection, labeling, training, and testing loops.
  • Collaborate with modeling teams to improve training efficiency and model performance across iterations.
  • Develop the core infrastructure that closes the loop between real-world test results and new model deployments.
  • Use engineering expertise to help vehicles learn from data at scale, improving safety and performance.
  • Mentor junior engineers and contribute to defining best practices for data-centric development.

Requirements

  • Bachelor's or higher degree in Engineering such as Computer Science, Electrical Engineering, Software Engineering.
  • 3–5 years of experience in software or data infrastructure engineering.
  • Expertise in building and scaling data pipelines, distributed systems, or ML infrastructure.
  • Proficiency in Python and strong knowledge of data frameworks (Spark, Airflow, Kafka, etc.).
  • Experience working with large-scale datasets and understanding data-driven development cycles.
  • Familiarity with machine learning workflows or model training/deployment, especially automation of those processes.
  • Strong systems thinking and ability to work across multiple parts of the stack (data, infra, and ML).
  • Interest in seeing the direct impact of infrastructure work on vehicle performance.

Nice to Have

  • Experience with automotive (AV) or robotics systems.
  • Previous work on ML platforms for large-scale products (e.g., Ads, Recommendation, or Autonomy pipelines).
  • Experience with highly automated ML training workflows.
  • Prior contributions to systems that connect data-driven model iteration loops ("data flywheel").
  • Ability to move fast, learn quickly, and mentor others while growing with the team.

Compensation

  • Base salary range: $125,000 - $222,000 USD annually.
  • Equity, comprehensive health/dental/vision/life/disability insurance, 401k with employer match, learning/wellness stipends, paid time off.

Skills

PythonSparkAirflowKafkaDistributed SystemsETLMachine LearningData PipelinesML Infrastructure

Bioinformatics Engineer

Develop and optimize Nextflow-based bioinformatics pipelines for high-throughput sequencing analysis on Google Cloud Platform. Requires 3+ years of production pipeline experience, Nextflow proficiency, and strong genomics analysis skills.

125k – 150kRockville, MDData EngineeringOn-site3+ YOERGCP

Analytics Engineer

Builds and maintains scalable data pipelines, models, and warehouses to enable business intelligence and decision-making. Collaborates with teams on data integration, governance, and self-serve analytics using SQL, dbt, Python, and BI tools. Requires 3+ years experience.

125k – 215kNew York, NYData EngineeringHybrid3+ YOESQLdbt

Software Engineer II, Big Data, tvScientific

Design and implement scalable data infrastructure using Spark/Scala on AWS. Build data pipelines, knowledge graphs, and APIs to support a high-growth CTV advertising platform.

124k – 255kSan Francisco, CAData EngineeringRemoteAWSSQL

Clinical Data Manager

Leads end-to-end clinical research data management, building scalable research databases, automating EHR-to-EDC integrations via APIs, ensuring data quality through QC and cleaning, and performing light statistical analysis. Requires 5+ years experience with SQL/Python, EHR datasets, and data standardization.

127k – 183kUnited StatesData EngineeringRemote5+ YOERSQL

Data Engineer

Build and maintain scalable data models, pipelines, and Core Data tables to transform raw data into actionable insights. Collaborate with data scientists and business teams using SQL, DBT, Snowflake, Airflow, and Python.

121k – 151kSan Francisco, CAData EngineeringHybrid3+ YOESQLdbt