Staff Software Engineer – Foundational Data Systems for AI

Lead architecture of exabyte-scale distributed data systems that self-optimize via compression, metadata, and intelligent layouts to power efficient AI infrastructure. Requires deep expertise in distributed systems, low-level data representation, and leadership of large-scale production systems.

240k – 290kSan Francisco, CAData EngineeringOnsite8+ YOE

Apply

About the role

What You’ll Build

Global Metadata Substrate: Define and evolve the global metadata and transactional substrate that powers atomic consistency and schema evolution across exabyte-scale data systems.
Adaptive Engines: Architect self-optimizing systems that continuously reorganize and compress data based on access patterns, achieving order-of-magnitude efficiency gains.
Intelligent Data Layouts: Pioneer new approaches to encoding and layout that push theoretical limits of signal per byte read.
Autonomous Compute Pipelines: Lead development of distributed compute platforms that scale predictively and maintain reliability under extreme load and failure conditions.
Research to Production: Collaborate with Granica Research to translate advances in compression and probabilistic modeling into production-grade, industry-defining systems.
Latency as Intelligence: Drive system-wide initiatives to minimize latency from insight to decision, enabling faster model learning and data-driven reasoning.

What You Bring

Mastery of distributed systems: consensus, replication, consistency, and performance at scale.
Proven track record of architecting and delivering large-scale data or compute systems with measurable 10× impact.
Expertise with columnar formats and low-level data representation techniques.
Deep production experience with Spark, Flink, or next-generation compute frameworks.
Fluency in Java, Rust, Go, or C++, emphasizing simplicity, performance, and maintainability.
Demonstrated leadership—mentoring senior engineers, influencing architecture, and scaling technical excellence.
Systems intuition rooted in theory: compression, entropy, and information efficiency.

Bonus

Familiarity with Iceberg, Delta Lake, or Hudi.
Published or open-source contributions in distributed systems, compression, or data representation.
Passion for bridging research and production to define the next frontier of efficient AI infrastructure.

Skills

Distributed SystemsJavaRustGoC++SparkFlinkColumnar FormatsConsensus AlgorithmsData Compression

Similar roles

Data Engineering jobs

Haus

Staff Engineer - Data Platform

Staff-level technical lead and architect for Haus's data ingestion and normalization platform. Owns schema evolution, data contracts, DQ frameworks, lineage, and pipeline observability in a GCP/BigQuery/dbt stack. Partners with DS and Product teams.

240k – 260kSeattle, WA +1Data EngineeringHybrid10+ YOESQLdbt

Haus

Staff Engineer - Data Platform

Staff-level technical lead and architect for Haus's data ingestion and normalization platform. Owns schema evolution, data contracts, DQ, lineage, and observability in a GCP/BigQuery/dbt stack. Partners with DS and Product; mentors senior engineers.

240k – 260kSan Francisco, CA +2Data EngineeringHybrid8+ YOESQLdbt

CodeRabbit

Staff Analytics Engineer

CodeRabbit is seeking a Staff Analytics Engineer to build and own their BigQuery and dbt data foundation. This role involves architecting the data warehouse, defining key metrics, building revenue models, and developing GTM intelligence layers.

240k – 250kSan Francisco, CA +1Data EngineeringHybrid6+ YOEdbtGCP

Hinge Health

Senior Staff Data Engineer - Data & ML Platform

Own the technical vision and architecture for a data & ML platform serving analytics, product, and machine learning workloads. Drive cross-org initiatives, set platform standards, and build infrastructure at the intersection of data engineering and ML systems.

240k – 360kSan Francisco, CAData EngineeringHybrid10+ YOESQLAWS

Together AI

Staff Analytics Engineer — Data Warehouse

Owns data warehouse transformation layer using dbt and Airflow, builds dimensional models for company-wide analytics, and partners with stakeholders to deliver trusted metrics on billing, usage, and operations.

240k – 275kSan Francisco, CAData EngineeringOn-siteSQLdbt