Senior Software Engineer – Foundational Data Systems for AI

Build foundational data systems for AI at scale, including global metadata, adaptive engines, and intelligent data layouts using distributed systems, columnar formats, and languages like Java/Rust/Go/C++.

190k – 250kSan Francisco, CAData EngineeringOnsite5+ YOE

Apply

About the role

What You’ll Build

Global Metadata Substrate: Architect the transactional and metadata substrate that supports time-travel, schema evolution, and atomic consistency across petabyte-scale tabular datasets.
Adaptive Engines: Build systems that reorganize data autonomously, learning from access patterns and workloads to maintain peak efficiency without manual tuning.
Intelligent Data Layouts: Optimize bit-level organization (encoding, compression, layout) to extract maximal signal per byte read.
Autonomous Compute Pipelines: Develop distributed compute systems that scale predictively, adapt to dynamic load, and maintain reliability under failure.
Research to Production: Implement new algorithms in compression, representation, and optimization emerging from ongoing research. Opportunities to publish and open-source are encouraged.
Latency as Intelligence: Design for minimal time between question and insight, enabling models and humans to learn faster from data.

What You Bring

Depth in distributed systems: consensus, partitioning, replication, fault tolerance.
Experience with columnar formats such as Parquet or ORC and low-level encoding strategies.
Understanding of metadata-driven architectures and adaptive query planning.
Production experience with Spark, Flink, or custom distributed engines on cloud object storage.
Proficiency in Java, Rust, Go, or C++ with an emphasis on clarity and quality.
Curiosity about theory of the mathematics of compression, entropy, and learning efficiency.
A builder’s mindset: pragmatic, rigorous, and grounded in long-term systems thinking.

Bonus

Familiarity with Iceberg, Delta Lake, or Hudi.
Research or open-source contributions in compression, indexing, or distributed computation.
Interest in how data representation affects training dynamics and model reasoning efficiency.

Compensation & Benefits

Competitive salary, meaningful equity, and substantial bonus for top performers.
Flexible time off plus comprehensive health coverage for you and your family.
Support for research, publication, and deep technical exploration.

Skills

Distributed SystemsParquetOrcSparkFlinkJavaRustGoC++IcebergDelta LakeHudi

Similar roles

Data Engineering jobs

Sentry

Senior Software Engineer, Events Analytics Platform

Senior backend/infrastructure engineer expanding Sentry's time-series data platform (Snuba/ClickHouse) to handle petabyte-scale events with sub-second latency. Requires 4+ years experience and distributed storage expertise.

190k – 280kSan Francisco, CAData EngineeringHybrid4+ YOERedisKafka

Jellyfish

Senior Data Engineer

Jellyfish is seeking a Senior Data Engineer to build, automate, and execute the next generation of their data platform. The role involves maintaining end-to-end data pipelines, modernizing orchestration, and automating data infrastructure.

190k – 240kUnited StatesData EngineeringRemoteSQLdbt

Sage

Lead Data Product Engineer

Leads development and architecture of client-facing data platform using Palantir Foundry in a low/no-code environment. Collaborates with Product and Design teams, applies software engineering best practices, and requires 7+ years experience with bachelor's in quantitative field.

190k – 225kNew York, NYData EngineeringHybrid7+ YOEHIPAAPython

Senior Analytics Engineer

Lead analytics engineering for Reddit's Sales and Marketing teams, building scalable data pipelines, ETLs, dashboards, and self-service tools to empower data-driven decision making. Requires 4-5+ years experience with large-scale ETL systems, Python/SQL, and data modeling; advanced quantitative degree required.

191k – 267kUnited StatesData EngineeringRemote5+ YOED3SQL

Plaid

Senior Data Engineer - Data Engineering

Builds and owns scalable SQL/Python data pipelines, golden datasets, and workflows using DBT, Airflow, Redshift for large-scale data (500TB+). Collaborates cross-functionally to enable data-driven decisions at Plaid. Requires 4+ years data engineering experience.

191k – 287kSan Francisco, CAData EngineeringHybrid4+ YOESQLdbt