Skip to content

Software Engineer, Data Acquisition

293k – 385kSan Francisco, CAOnsite4+ YOE
Summary

Builds and leads data acquisition systems including web crawling, ingestion, and scalable distributed processing for model training. Requires 4+ years experience, expertise in Kubernetes and large-scale data systems, and BS/MS/PhD in Computer Science.

About the role

Responsibilities

  • Own and lead engineering projects in the area of data acquisition including web crawling, data ingestion, and search.
  • Collaborate with other sub-teams, such as Data Processing, Architecture, and Scaling, to ensure smooth data flow and system operability.
  • Work closely with the legal team to handle any compliance or data privacy-related matters.
  • Develop and deploy highly scalable distributed systems capable of handling petabytes of data.
  • Architect and implement algorithms for data indexing and search capabilities.
  • Build and maintain backend services for data storage, including work with key-value databases and synchronization.
  • Deploy solutions in a Kubernetes Infrastructure-as-Code environment and perform routine system checks.
  • Conduct and analyze experiments on data to provide insights into system performance.

Qualifications

  • BS/MS/PhD in Computer Science or a related field.
  • 4+ years of industry experience in software development.
  • Experience with large web crawlers a plus.
  • Strong expertise in large stateful distributed systems and data processing.
  • Proficiency in Kubernetes, and Infrastructure-as-Code concepts.
  • Willingness and enthusiasm for trying new approaches and technologies.
  • Ability to handle multiple tasks and adapt to changing priorities.
  • Strong communication skills, both written and verbal.
Skills
KubernetesDistributed SystemsWeb CrawlingData ProcessingInfrastructure as CodeKey-Value DatabasesData IndexingSearch AlgorithmsBackend ServicesData Ingestion
Similar roles at this salary range
All Data Engineering jobs →
CodeRabbit

Staff Analytics Engineer

CodeRabbit is seeking a Staff Analytics Engineer to build and own their BigQuery and dbt data foundation. This role involves architecting the data warehouse, defining key metrics, building revenue models, and developing GTM intelligence layers.

240k – 250kSan Francisco, CA +1Data EngineeringHybriddbtGCP
Discord

Staff Data Engineer, Ads

Discord is seeking a Staff Data Engineer to lead technical vision and strategy for ads data infrastructure. This role involves building and maintaining sophisticated data pipelines, datasets, and analytical tools, and mentoring other engineers.

248k – 279kUnited StatesData EngineeringRemoteSQLETL
Zocdoc

Senior Staff Engineer, Data Platform

As a Senior Staff Data Platform Engineer, you will define platform standards, lead cross-domain initiatives, and shape the future of Zocdoc's analytics and data infrastructure. You will ensure the data ecosystem is secure, reliable, compliant, performant, and cost-efficient.

235k – 300kSan Francisco, CAData EngineeringRemoteSQLdbt
OpenAI

IT Controls Data Engineer

As an IT Controls Data Engineer, you will build and maintain data infrastructure for audit readiness, IT controls, and continuous control monitoring. This role involves designing pipelines, datasets, and automated validation to ensure reliable control data.

293k – 385kSan Francisco, CAData EngineeringHybridSQLAWS
Clubhouse

Head of Data

As Head of Data, you will lead the data function end-to-end, shaping strategy, building data pipelines, and driving product and business decisions. You will also build and lead a small, high-leverage team, pushing the boundaries of AI tooling.

250k – 315kUnited StatesData EngineeringRemoteAISQL