Software Engineer, Data Foundations

Build and scale data ingestion pipelines and connectors for enterprise SaaS apps, transform unstructured data for AI search and agents, ensure reliability and security at petabyte scale. Requires 3+ years backend/data infrastructure experience with distributed systems.

140k – 265kUnited StatesData EngineeringHybrid3+ YOE

Apply

About the role

You will work on:

Ingestion & Connectivity

Build and scale connectors to SaaS and on-prem systems (Google Workspace, Microsoft 365, Slack, Salesforce, Jira, ServiceNow, GitHub, etc.).
Handle full syncs, low-latency incremental updates via webhooks/APIs, rate-limiting, and complex authentication flows.
Build advanced capabilities in datasources like actions, live-fetch, and query language support.

Data Processing & Modeling

Transform raw, unstructured enterprise content into rich, structured, permission-aware representations optimized for search and LLM reasoning.
Design document schemas and enrichment pipelines (entity extraction, access-graph propagation, redactions, etc.).
Expand AI products through deep integrations for task automation, complex queries, and live data enhancement.

Reliability & Distributed Systems

Own end-to-end correctness, freshness, and performance for petabyte-scale data flows.
Solve problems in ordering, idempotency, exactly-once processing, backpressure, and retries across distributed queues, workers, and storage.

Security & Permissions

Preserve fine-grained ACLs, deletions, and sensitivity constraints so AI answers are grounded in user permissions.

Cross-Functional Impact

Partner with Search Serving, Product, Platforms, and Security teams to define enterprise context exposure to LLMs and agents.
Improve observability, alerting, and automation for larger customers and data sources.

About you:

3+ years building production backend or data infrastructure systems (Java, Go, C++, Python, etc.).
Hands-on experience with distributed systems, data pipelines, queues, and large-scale storage (SQL/NoSQL).
Think in SLOs, error budgets, failure modes, and correctness guarantees.
Comfortable with strict consistency and permission-modeling challenges.
Prior work on enterprise connectors, search/indexing, information retrieval, or security-sensitive systems is a strong plus.
Passionate about trustworthy AI via rock-solid data foundations.
Power user of LLMs and AI tools.

Compensation & Benefits

Base salary range: $140,000 - $265,000 annually (varies by location, level, knowledge, skills, experience). Eligible for variable compensation, equity, and benefits including medical, vision, dental, time-off, 401k, stipends, events, and daily lunches.

Skills

JavaGoC++PythonSQLNoSQLDistributed SystemsData PipelinesKubernetesApache Kafka

Similar roles

Data Engineering jobs

Machinify

Healthcare Data Analyst

Create advanced SQL/Spark SQL queries and prompt-engineered LLM workflows to transform healthcare claims data into clinical insights and automated policy tools. Requires 3-5 years SQL plus 2-3 years healthcare experience.

140k – 170kUnited StatesData EngineeringRemote3+ YOESQLClaude

Pareto AI

Technical Data Delivery Lead

Leads architecture, execution, and improvement of data collection/evaluation pipelines for AI labs, including agentic automation for quality and delivery. Requires Python/SQL proficiency, LLM internals knowledge, and hands-on agent framework experience.

140k – 180kUnited StatesData EngineeringRemoteSQLSft

Tabs

Data Engineer

Build core data infrastructure as the first Data Engineer, designing scalable warehouse/lakehouse, data pipelines, and models for KPIs and AI systems. Requires 3-5+ years experience with Python, SQL, and modern cloud data stack in startups.

140k – 195kNew York, NYData EngineeringOn-site3+ YOESQLdbt

Twilio

Software Engineer L3 Data Substrate

As a Software Engineer on the Data & Analytics Platform team, you will design, build, and optimize the data platform to support various data-driven initiatives. You will work with cross-functional teams to architect scalable solutions and implement data infrastructure using modern data technologies.

139k – 204kUnited StatesData EngineeringRemote5+ YOEHiveHudi

Rad AI

Data Engineer

Senior Data Engineer building scalable data pipelines and infrastructure on AWS using Spark, Metaflow, and container orchestration. Requires 5+ years of experience designing distributed data systems.

145k – 190kUnited StatesData EngineeringRemote5+ YOEAWSSQL