AI-Native Data Platform Engineer

Designs and owns canonical data foundations, ingestion pipelines, and AI-ready schemas for financial AI systems in wealth management. Requires 5+ years data platform experience, SQL/Python expertise, custodial data knowledge, and AWS proficiency.

New York, NYData EngineeringHybrid5+ YOE

Apply

About the role

Responsibilities

Design scalable ingestion pipelines across custodians (Schwab, Fidelity, Pershing, etc.) and internal financial systems
Build and evolve canonical models for accounts, positions, transactions, balances, corporate actions, and household hierarchies
Define financial data ontology and enforce strong data contracts across services
Implement reconciliation frameworks and golden-source resolution across multi-vendor datasets
Engineer AI-ready data layers optimized for embeddings, vector search, and RAG architectures
Structure financial datasets to improve prompt reliability and LLM output consistency
Architect closed-loop, agent-driven systems that monitor, reason over, and autonomously remediate data inconsistencies
Implement observability, lineage, governance, and fine-grained access controls across regulated datasets

Requirements

5+ years building production-grade data platforms
Deep SQL expertise and strong Python for data engineering
Experience designing canonical schemas and resolving vendor data inconsistencies
Strong understanding of custodial financial data (positions, trades, balances, performance, corporate actions)
Familiarity with embeddings, vector databases, and retrieval architectures
Exposure to prompt engineering and structured context design for LLM systems
Knowledge of MLOps fundamentals (versioning, monitoring, reproducibility)
Comfortable with AWS data services (S3, Lambda, ECS, Glue, Redshift, OpenSearch) and event-driven orchestration
Strong ownership mindset and systems-level thinking

Nice-to-Haves

Wealth management or capital markets background
Experience integrating OpenAI or Anthropic APIs into production systems
Experience designing retrieval schemas for AI agents
Experience with authorization and policy platforms (e.g., OSO, Auth0)
Experience implementing fine-grained access control for AI-driven systems
Familiarity with GitHub-based CI/CD workflows and automation
Experience with data governance, lineage, and compliance controls

Benefits

Full health benefits + 401(k) matching & Roth IRA options
Unlimited PTO

Skills

SQLPythonAWSRedshiftGlueS3AWS LambdaOpensearchEmbeddingsVector DatabasesRAGMLOpsPrompt Engineering

Similar roles

Data Engineering jobs

Cursor

Software Engineer, Storage

Software Engineer on the Storage team owning the data layer (databases, caches, scaling strategies) that underpins all Cursor products. Design multi-database architectures, build query guardrails, define storage best practices, and own cache infrastructure for reliability and growth.

San Francisco, CA +1Data EngineeringOn-site5+ YOEOltpMySQL

Machinify

Healthcare Data Analyst

Create advanced SQL/Spark SQL queries and prompt-engineered LLM workflows to transform healthcare claims data into clinical insights and automated policy tools. Requires 3-5 years SQL plus 2-3 years healthcare experience.

140k – 170kUnited StatesData EngineeringRemote3+ YOESQLClaude

Coinbase

Analytics Engineer

Build and maintain data models, pipelines, and dashboards that power customer experience and compliance operations. Partner with CX and compliance teams to deliver trusted, self-serve analytics.

152k – 179kUnited StatesData EngineeringRemote3+ YOESQLdbt

Rad AI

Data Engineer

Senior Data Engineer building scalable data pipelines and infrastructure on AWS using Spark, Metaflow, and container orchestration. Requires 5+ years of experience designing distributed data systems.

145k – 190kUnited StatesData EngineeringRemote5+ YOEAWSSQL

Mach9

Software Engineer, Sensor Integration

Build and maintain ingestion pipelines that convert large-scale geospatial sensor data (LiDAR, imagery) into standardized formats for ML training and product use. Requires strong Python skills, comfort with undocumented formats, and distributed systems experience.

San Francisco, CAData EngineeringHybridC++Gdal