Software Development Engineer - Data Acquisition & Normailization

Builds and maintains data connectors, pipelines, and normalization services to acquire and validate identity attributes from various sources for the Identity Trust Graph. Requires 4+ years in OOP languages, containerized cloud systems like GCP/Kubernetes, and data integration experience.

169k – 193kMountain View, CAData EngineeringOnsite4+ YOE

Apply

About the role

Key Responsibilities

Build and maintain connectors to government registries, telcos, licensing authorities, and commercial data providers.
Standardize and reconcile heterogeneous data formats into clean schemas usable by the Identity Trust Graph.
Monitor and help resolve upstream source changes; contribute to retries, fallbacks, and error handling to improve pipeline reliability.
Contribute to the Attribute Validation Service (AVS) by adding trusted data that validates identity attributes against sources of record.
Help deliver clean and validated attribute data to downstream consumers including Wallet, Fraud, and Domains.
Assist in reporting coverage and freshness metrics to Product, Ops, and Analytics stakeholders.
Handle sensitive data in accordance with NIST, ISO 27001, and FedRAMP standards.
Write high-quality, maintainable, and well-tested code, including automated tests and observability instrumentation.
Participate in system design discussions, code reviews, and technical documentation to support team alignment.

Required Qualifications

Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience).
4+ years of experience developing web applications using OOP languages such as Java, Ruby, JavaScript, TypeScript, Go, Python, Rust, or C++.
Exposure to data acquisition or integration work, including APIs, screen scraping, ETL, or normalization pipelines.
Experience building systems in Docker, Kubernetes or Nomad and services in a containerized, cloud-based, infrastructure-as-code driven ecosystem such as GCP.
Ability to deliver features end to end, including automated test coverage, observability, monitoring, and documentation.
Ability to communicate technical tradeoffs clearly and work collaboratively within a team.
Proficiency and strong interest in AI-assisted development tools (e.g., Claude Code or Codex) to accelerate delivery and code quality.

Preferred Qualifications

Familiarity with operating data pipelines with reliability and SLA requirements.
Understanding of distributed systems concepts, caching, asynchronous processing, and cloud-native patterns.
Exposure to authentication and authorization standards (OAuth2, OIDC, JWT, or custom schemes).
Familiarity with identity and credential verification systems, including data validation, proofing, or trust scoring.
Exposure to event-driven architectures (Kafka, SNS/SQS) and patterns for decoupled service communication.
Experience with cloud infrastructure (AWS, GCP, or Azure), including containerization and deployment pipelines.
Familiarity with observability, monitoring, and incident response best practices.
Awareness of compliance and security requirements for sensitive data (NIST, FedRAMP, ISO 27001).
Bonus: Exposure to FinTech, identity, or data aggregation companies (e.g., Plaid, Yodlee, Envestnet).

Skills

JavaPythonGoJavaScriptTypeScriptDockerKubernetesGCPETLAPIsKafkaOauth2OIDCJwtAWS

Similar roles

Data Engineering jobs

Ramp

Software Engineer, Data Platform

Builds infrastructure and tools for Ramp's Analytics and Machine Learning Platforms, supporting data science lifecycle. Partners with AI and ML engineers; requires Python, workflow orchestrators, cloud platforms, and SQL expertise.

168k – 325kNew York, NYData EngineeringHybridAWSGCP

Sesame

Data Engineer, Machine Learning

Build and maintain production data pipelines that prepare conversational, voice, and multimodal data for ML model training and evaluation. Partner closely with ML engineers to deliver high-quality, versioned datasets and infrastructure.

170k – 240kSan Francisco, CAData EngineeringOn-site5+ YOESQLETL

11x

Data Engineer

Own and extend customer data ingestion platform and large-scale pipelines powering AI workers. Build data lake, retrieval layer, and infrastructure for syncing, enriching, and querying customer data across CRMs and third-party systems.

170k – 200kUnited StatesData EngineeringRemote4+ YOEPythonAirbyte

Lumos

Software Engineer, Data Platform

Build and operate the identity data platform that ingests, transforms, and serves high-volume identity data to power all Lumos products. Own ingestion pipelines, service layers, APIs, and observability for correctness and reliability.

170k – 220kUnited StatesData EngineeringRemote3+ YOEGoSLOs

Mark43

Lead Data Engineer

Leads design and implementation of scalable data pipelines using dbt, Airflow, and SQL. Mentors data engineers, optimizes AWS infrastructure with Terraform, and builds integrations with MySQL/SQL Server/OLAP for analytics. Requires strong SQL/Python and production pipeline expertise.

170k – 210kNew York, NYData EngineeringRemoteSQLAWS