Skip to content

Staff Data Engineer

175k – 218kBozeman, MTMissoula, MTAustin, TXCharlotte, NCHybrid12+ YOE
Summary

Designs and evolves scalable Iceberg-based lakehouse architecture, metadata governance, and security controls for analytics, product, and AI systems. Requires 12+ years experience with Python, SQL, Airflow, and major cloud platforms.

About the role

Responsibilities

Technical Leadership & Architecture

  • Design and evolve the Iceberg-based lakehouse architecture to balance scalability, cost, performance, and maintainability.
  • Define and promote standards for table design, partitioning, schema evolution, optimization, and data layout.
  • Lead architectural efforts spanning batch, streaming, and event-driven data processing where they deliver business value.
  • Drive the design and delivery of complex, cross-team initiatives, enabling teams to move independently within established architectural guidance.
  • Evaluate and integrate technologies (build vs. buy).

Metadata, Governance & Open Standards

  • Define how datasets, pipelines, features, and models are described, related, and governed using shared metadata.
  • Lead the adoption and integration of open-source metadata and catalog tools (e.g., OpenMetadata).
  • Establish metadata standards that enable self-service analytics, governance, and AI readiness.
  • Partner with BI and Analytics to ensure domain models are clearly documented and aligned to business language.
  • Collaborate with Data Science to ensure model inputs, features, and outputs are traceable, explainable, and reusable.

Security, Access Control & Compliance

  • Design and evolve security and access-control models for Apache Iceberg, including table-, column-, and row-level controls.
  • Partner with Security and Platform teams to embed policy enforcement directly into data access paths.
  • Drive metadata-driven authorization patterns that scale across tools and user groups.
  • Ensure privacy, compliance, and regulatory requirements are incorporated into platform design.
  • Balance strong security guarantees with usability to support safe self-service.

Platform Reliability & Operations

  • Build and maintain automation for compaction, retention, lifecycle management, and cost controls.
  • Establish observability standards that connect pipeline health, data quality, and reliability metrics.
  • Provide architectural oversight during critical incidents and drive long-term 'Keep the Lights On' (KTLO) reduction.
  • Recommend tooling and process improvements based on industry standards and operational experience.

Organizational Impact & Collaboration

  • Align technical work with business priorities by understanding how data supports onX products and customer outcomes.
  • Communicate complex technical concepts clearly to engineers, product partners, and leadership.
  • Lead and participate in architecture and design reviews, setting a high bar for technical rigor.
  • Foster strong cross-team collaboration across Data Engineering, Platform, Security, Analytics, and Data Science.
  • Mentor senior and mid-level engineers, raising the technical bar across the team.

Requirements

Required

  • Bachelor’s degree in Computer Science or equivalent experience.
  • Deep industry experience (typically 12+ years) building and operating large-scale data systems.
  • Deep expertise in distributed data systems and data architecture.
  • Strong experience with Apache Iceberg and similar table formats (Delta Lake, Hudi).
  • Proven experience designing secure and governed data platforms.
  • Expertise in Python, SQL, and orchestration patterns (e.g., Airflow).
  • Experience working with data ecosystems, including metadata, catalog, or governance tooling.
  • Strong written and verbal communication skills.
  • Permanent U.S. work authorization.

Cloud & Platform Experience

  • Deep experience in at least one major cloud environment (GCP, AWS, or Azure).
  • Familiarity with cloud-native data services such as query engines, stream/batch processing systems, and object storage–based lakehouses.
  • Comfort with infrastructure-as-code and automated platform management.

Compensation

  • Base salary: $175,000 - $218,000 upon hire (varies based on experience, skills, certifications, and education).
  • Full-time employees eligible for common share options (vesting schedule) and potential annual bonus of 10% based on company performance.
Skills
Apache IcebergDelta LakeHudiPythonSQLAirflowGCPAWSAzureOpenMetadatalakehousedata governancemetadataIcebergdistributed systems
Similar roles at this salary range
All Data Engineering jobs →
Discord

Software Engineer, Data Platform

Build and maintain data infrastructure processing petabytes of data. Own end-to-end projects for data ingestion, transformation, and serving systems. Requires 3+ years of software engineering experience.

160k – 200kUnited StatesData EngineeringOn-site3+ YOEGoSQL
Twilio

Staff Analytics Engineer

Design and maintain a robust business data layer in dbt to enable trusted GTM sales analytics, reporting, data science, and AI capabilities. Requires 8+ years in analytics engineering with advanced SQL and dbt expertise.

156k – 229kUnited StatesData EngineeringRemote8+ YOESQLdbt
11x

Data Engineer

Own and extend customer data ingestion platform and large-scale pipelines powering AI workers. Build data lake, retrieval layer, and infrastructure for syncing, enriching, and querying customer data across CRMs and third-party systems.

170k – 200kUnited StatesData EngineeringRemote4+ YOEPythonAirbyte
Okta

Staff Software Engineer, Data Platform

Staff Software Engineer building and scaling high-volume, low-latency distributed data platform services and analytics infrastructure using Java, Kinesis, Flink, Snowflake, and Kubernetes. Requires 8+ years experience and U.S. Person status for FedRAMP access.

194k – 267kSan Francisco, CAData EngineeringHybrid8+ YOEAWSJava
Reddit

Software Engineer - Data Movement Platform

Software engineer building and maintaining scalable data movement infrastructure using Spark, Flink, and Airflow to support ML and analytics workloads processing 100B+ daily events.

164k – 230kUnited StatesData EngineeringRemote2+ YOEGoJava