Skip to content

AI-Native Data Platform Engineer

Designs and owns canonical data foundations, ingestion pipelines, and AI-ready schemas for financial AI systems in wealth management. Requires 5+ years data platform experience, SQL/Python expertise, custodial data knowledge, and AWS proficiency.

New York, NYData EngineeringHybrid5+ YOE

About the role

Responsibilities

  • Design scalable ingestion pipelines across custodians (Schwab, Fidelity, Pershing, etc.) and internal financial systems
  • Build and evolve canonical models for accounts, positions, transactions, balances, corporate actions, and household hierarchies
  • Define financial data ontology and enforce strong data contracts across services
  • Implement reconciliation frameworks and golden-source resolution across multi-vendor datasets
  • Engineer AI-ready data layers optimized for embeddings, vector search, and RAG architectures
  • Structure financial datasets to improve prompt reliability and LLM output consistency
  • Architect closed-loop, agent-driven systems that monitor, reason over, and autonomously remediate data inconsistencies
  • Implement observability, lineage, governance, and fine-grained access controls across regulated datasets

Requirements

  • 5+ years building production-grade data platforms
  • Deep SQL expertise and strong Python for data engineering
  • Experience designing canonical schemas and resolving vendor data inconsistencies
  • Strong understanding of custodial financial data (positions, trades, balances, performance, corporate actions)
  • Familiarity with embeddings, vector databases, and retrieval architectures
  • Exposure to prompt engineering and structured context design for LLM systems
  • Knowledge of MLOps fundamentals (versioning, monitoring, reproducibility)
  • Comfortable with AWS data services (S3, Lambda, ECS, Glue, Redshift, OpenSearch) and event-driven orchestration
  • Strong ownership mindset and systems-level thinking

Nice-to-Haves

  • Wealth management or capital markets background
  • Experience integrating OpenAI or Anthropic APIs into production systems
  • Experience designing retrieval schemas for AI agents
  • Experience with authorization and policy platforms (e.g., OSO, Auth0)
  • Experience implementing fine-grained access control for AI-driven systems
  • Familiarity with GitHub-based CI/CD workflows and automation
  • Experience with data governance, lineage, and compliance controls

Benefits

  • Full health benefits + 401(k) matching & Roth IRA options
  • Unlimited PTO

Skills

SQLPythonAWSRedshiftGlueS3AWS LambdaOpensearchEmbeddingsVector DatabasesRAGMLOpsPrompt Engineering

Software Engineer, Storage

Software Engineer on the Storage team owning the data layer (databases, caches, scaling strategies) that underpins all Cursor products. Design multi-database architectures, build query guardrails, define storage best practices, and own cache infrastructure for reliability and growth.

San Francisco, CA +1Data EngineeringOn-site5+ YOEOltpMySQL

Healthcare Data Analyst

Create advanced SQL/Spark SQL queries and prompt-engineered LLM workflows to transform healthcare claims data into clinical insights and automated policy tools. Requires 3-5 years SQL plus 2-3 years healthcare experience.

140k – 170kUnited StatesData EngineeringRemote3+ YOESQLClaude

Analytics Engineer

Build and maintain data models, pipelines, and dashboards that power customer experience and compliance operations. Partner with CX and compliance teams to deliver trusted, self-serve analytics.

152k – 179kUnited StatesData EngineeringRemote3+ YOESQLdbt

Data Engineer

Senior Data Engineer building scalable data pipelines and infrastructure on AWS using Spark, Metaflow, and container orchestration. Requires 5+ years of experience designing distributed data systems.

145k – 190kUnited StatesData EngineeringRemote5+ YOEAWSSQL

Software Engineer, Sensor Integration

Build and maintain ingestion pipelines that convert large-scale geospatial sensor data (LiDAR, imagery) into standardized formats for ML training and product use. Requires strong Python skills, comfort with undocumented formats, and distributed systems experience.

San Francisco, CAData EngineeringHybridC++Gdal