Staff Data Architect
Jellyfish is seeking a Staff/Lead Data Architect to design, automate, and scale their next-generation data platform. This role involves maturing core data models, automating environment boundaries, and driving advanced observability and cost-attribution into the data pipeline architecture.
What you’ll actually be doing:
- Architectural Evolution & Blueprinting – You’ll own the blueprint for the next-generation Jellyfish data platform. You'll tackle our existing data footprint, refactoring pipelines and structures into highly efficient, scalable patterns (like Medallion-style schemas or unified semantic layers).
- Automated Data Governance – You’ll design and automate strict, code-driven environment isolation boundaries. You'll ensure dev, staging, and production data catalogs (and their underlying cloud storage) never dangerously cohabitate, eliminating the risk of "fat-finger" data drops or PII leakage.
- Orchestration & Compute Scaling – You’ll lead the modernization of our workflow orchestration and distributed compute engines. You’ll focus on slashing engine runtime overhead, eliminating API bottlenecks, and streamlining heavy parallelized or mapped data tasks.
- Modern Integration Middleware – You'll partner with application teams to ensure our React frontends and backend services hit highly secure, cached API and Backend-for-Frontend (BFF) layers rather than querying raw data services directly, protecting our warehouses from concurrency spikes.
- Proactive Data Observability & FinOps – You’ll build and maintain granular data-quality monitors and cost-allocation frameworks. You won't just track overall warehouse spend; you’ll implement systems to map execution cost and token usage directly down to the tenant, team, or user level.
You’re a great fit if:
- Data Tooling Fluency – You have deep, production-level experience with Python, advanced SQL, and modern data stack essentials. You are deeply familiar with programmatic orchestrators (like Prefect, Dagster, or Airflow) and modern data validation engines (like Pydantic v2).
- Catalog & Warehouse Practitioner – You have hands-on mastery of enterprise-scale data platforms and governance layers (e.g., Snowflake, Databricks Unity Catalog, BigQuery) and know exactly how to map environments to catalogs and data quality to schemas.
- Automation Mindset – You look at a manual data backfill or a clicked-together database permission and immediately think about how to automate it via Infrastructure-as-Code (Terraform) or programmatic workflows.
- Collaborative Systems Thinker – You don’t design in a vacuum. You are excellent at documenting data lineage, mentoring data engineers, and collaborating across DevOps and Product teams to align infrastructure with business goals.
- Pragmatic Problem Solver – You know the difference between data quality stages and software development lifecycles. You know when a "perfect" distributed cluster is required and when a "good enough" cached view keeps the business moving.
Bonus Points:
- You’ve survived (and thrived in) a rapidly scaling B2B SaaS startup handling massive multi-tenant data sets.
- You have strong opinions on the future of Git-like data versioning and zero-copy cloning (e.g., Iceberg, Nessie).
- You’ve managed complex cloud-billing attributions or scaled heavy LLM/vector-embedding data workloads and lived to tell the tale.
Occasional travel may be required. Applicants must be authorized to work for any employer in the US. We are unable to sponsor or take over sponsorship of an employment visa at this time.
Staff Engineer - Data Platform
Staff-level technical lead and architect for Haus's data ingestion and normalization platform. Owns schema evolution, data contracts, DQ frameworks, lineage, and pipeline observability in a GCP/BigQuery/dbt stack. Partners with DS and Product teams.
Senior Software Engineer
Senior Software Engineer building and scaling Chime's data platform, ETL pipelines, and distributed data infrastructure. Requires a Master's degree and 3+ years of experience with AWS/GCP, Spark/Trino, Kubernetes, and CI/CD.
Data Engineer, Machine Learning
Build and maintain production data pipelines that prepare conversational, voice, and multimodal data for ML model training and evaluation. Partner closely with ML engineers to deliver high-quality, versioned datasets and infrastructure.