Staff Data Engineer

Builds and maintains scalable data pipelines for processing massive lead datasets and real-time intent signals. Owns data ingestion, modeling, ML dataset preparation, quality monitoring, and optimization in a fast-paced AI startup.

200k – 300kSan Francisco, CANew York, NYData EngineeringOnsite

Apply

About the role

Responsibilities

Design, build, and maintain scalable data pipelines that process and transform large volumes of structured and unstructured data
Manage ingestion from third-party APIs, internal systems, and customer datasets
Develop and maintain data models, data schemas, and storage systems optimized for ML and product performance
Collaborate with ML engineers to prepare model-ready datasets, embeddings, feature stores, and evaluation data
Implement data quality monitoring, validation, and observability
Work closely with product engineers to support new features that rely on complex data flows
Optimize systems for performance, cost, and reliability
Contribute to early architecture decisions, infrastructure design, and best practices for data governance
Build tooling that enables the entire team to access clean, well-structured data

Requirements

3+ years of experience as a Data Engineer
Proficiency in Python, SQL, and modern data tooling (dbt, Airflow, Dagster, or similar)
Comfort working in fast, ambiguous environments
Experience designing and operating ETL/ELT pipelines in production
Experience with cloud platforms (AWS, GCP, or Azure)
Familiarity with data lakes, warehouses, and vector databases
Experience integrating APIs and working with semi-structured data (JSON, logs, event streams)
Strong understanding of data modeling and optimization

Nice-to-haves

Experience supporting LLMs, embeddings, or ML training pipelines
Startup experience

Skills

PythonSQLdbtAirflowDagsterETLELTAWSGCPAzureData LakesData WarehousesVector DatabasesAPIsJSON

Similar roles

Data Engineering jobs

Jellyfish

Staff Data Engineer

Staff Data Engineer building and scaling data pipelines, integrations, and workflow orchestration systems. Owns architecture, IaC strategy, and technical leadership across large-scale data infrastructure.

200k – 260kUnited StatesData EngineeringRemote7+ YOEPythonPrefect

Nuance Labs

Member of Technical Staff — ML Data Infra

Build and operate large-scale multimodal data pipelines for AI avatar model training. Design production-grade systems for petabyte-scale video, audio, and text data.

200k – 300kSeattle, WAData EngineeringOn-site5+ YOERayDvc

Imprint

Staff Data Engineer

As a Staff Data Engineer, you will architect and scale Imprint's data platform, optimizing infrastructure and driving technical excellence. You will build critical financial reporting pipelines, establish data standards, and mentor other engineers.

200k – 250kSan Francisco, CA +1Data EngineeringOn-site10+ YOES3SQL

Armis

Senior Staff Data Infrastructure Engineer

Lead and contribute to architectural initiatives for data infrastructure in FedRAMP environments. This role focuses on scalability, cost-efficiency, operational excellence, and security compliance for data-intensive systems.

200k – 220kUnited StatesData EngineeringRemote7+ YOEEmrAWS

Jellyfish

Staff Data Architect

Jellyfish is seeking a Staff/Lead Data Architect to design, automate, and scale their next-generation data platform. This role involves maturing core data models, automating environment boundaries, and driving advanced observability and cost-attribution into the data pipeline architecture.

200k – 260kUnited StatesData EngineeringRemoteSQLPython