Skip to content

Software Engineer, Sensor Integration

Build and maintain ingestion pipelines that convert large-scale geospatial sensor data (LiDAR, imagery) into standardized formats for ML training and product use. Requires strong Python skills, comfort with undocumented formats, and distributed systems experience.

San Francisco, CAData EngineeringHybrid

About the role

Responsibilities

  • Own the ingestion pipelines that convert point clouds and imagery from hardware vendors into Mach9's standard internal format
  • Reverse-engineer new vendor formats and updates — often working only with sparse or missing documentation — to expand what data Mach9 can take in
  • Build agentic systems to automatically triage failures and reformat data
  • Build automated checks and regression testing to guarantee the consistency of our data
  • Optimize the performance of our processing and storage across massive geospatial datasets in the cloud
  • Work directly with customers and partners to unblock critical customer projects

Requirements

  • Strong software development and debugging skills
  • Experience building production software in Python
  • Comfort operating with ambiguity — ability to dig into undocumented or messy data formats and reverse-engineer them
  • Strong communication skills, with the ability to work across ML, product, and customer success teams
  • A foundation in parallel computing or distributed systems
  • Bachelor's degree in Computer Science, Engineering, or equivalent experience

Nice-to-Haves

  • Experience building agentic systems and setting up agent harnesses — orchestrating LLM-driven workflows for triage, debugging, or automated code patching
  • Understanding of geospatial data formats (e.g., LAS/LAZ, COPC, E57, GeoTIFF, Shapefiles) and tooling (e.g., GDAL, PDAL, untwine, laz-perf)
  • Expertise designing and managing data schemas and storage systems for geospatial data (e.g., Postgres/PostGIS, AWS S3)
  • Experience with large-scale data processing frameworks and cloud platforms (e.g., Spark, AWS Batch)
  • Familiarity with coordinate reference systems and transforms (CRS, WKT, pyproj, affine transforms)
  • Experience building data versioning, lineage, or artifact-tracking systems
  • Experience operating data pipelines that feed ML training and inference
  • Familiar with C++

Skills

PythonParallel ComputingDistributed SystemsGdalPdalPostgisAws S3SparkAws BatchC++

Software Engineer, Storage

Software Engineer on the Storage team owning the data layer (databases, caches, scaling strategies) that underpins all Cursor products. Design multi-database architectures, build query guardrails, define storage best practices, and own cache infrastructure for reliability and growth.

San Francisco, CA +1Data EngineeringOn-site5+ YOEOltpMySQL

Healthcare Data Analyst

Create advanced SQL/Spark SQL queries and prompt-engineered LLM workflows to transform healthcare claims data into clinical insights and automated policy tools. Requires 3-5 years SQL plus 2-3 years healthcare experience.

140k – 170kUnited StatesData EngineeringRemote3+ YOESQLClaude

Analytics Engineer

Build and maintain data models, pipelines, and dashboards that power customer experience and compliance operations. Partner with CX and compliance teams to deliver trusted, self-serve analytics.

152k – 179kUnited StatesData EngineeringRemote3+ YOESQLdbt

Data Engineer

Senior Data Engineer building scalable data pipelines and infrastructure on AWS using Spark, Metaflow, and container orchestration. Requires 5+ years of experience designing distributed data systems.

145k – 190kUnited StatesData EngineeringRemote5+ YOEAWSSQL

Data Engineer

Design, build, and maintain data pipelines for biomedical and clinical research datasets. Work with scientists and researchers to deliver accessible, well-governed data products using Python, SQL, and ETL/ELT processes.

Rockville, MDData EngineeringOn-siteSQLETL