Senior Software Engineer, ML Platform

230k – 265kSan Francisco, CAML EngineeringHybrid5+ YOEJan 5

Summary

Build and maintain scalable ML platform for model experimentation, training, evaluation, inference, and feature store to power underwriting products. Requires 5+ years experience with Python, ML stacks like Databricks/AWS, and MLOps systems.

About the role

What You'll Do

Turn notebooks into software. Decompose data scientist training/inference notebooks into reusable, tested components (libraries, pipelines, templates) with clear interfaces and documentation.
Create developer-friendly ML abstractions. Build SDKs, CLIs, and templates that make it simple to define features, train/evaluate models, and deploy to batch or real-time targets with minimal boilerplate.
Build our real-time ML inference platform. Stand up and scale low-latency model serving.
Expand batch ML inference. Improve scheduling, parallelism, cost controls, observability, and failure/rollback for large-scale batch scoring and post-processing.
Own and expand the feature store. Design offline/online feature definitions, high read/write throughput, and consistent offline/online semantics.
Platform reliability and observability. Instrument training/inference for latency, throughput, accuracy, drift, data quality, and cost; build alerting and dashboards; drive incident response and postmortems.
Underwriting infrastructure partnership. Support production batch and real-time underwriting systems in collaboration with Data Science; collaborate on model interfaces, SLAs, safety checks, and product integrations.

What We Are Looking For

5+ years of software engineering experience, including experience on ML platform/MLOps systems (training, deployment, and/or feature pipelines).
Strong Python; solid software design and testing fundamentals. Proficiency with SQL; hands-on Spark/PySpark experience.
Knowledge of ML fundamentals—probability & statistics, supervised vs. unsupervised learning, bias/variance & regularization, feature engineering, model evaluation metrics, validation strategies, and production concerns like drift, stability, and monitoring.
Expertise with modern data/ML stacks—AWS, Databricks (workflows, lakehouse, MLflow/registry, Model Serving), and Airflow (or equivalent orchestration).
Experience building real-time systems (service design, caching, rate limiting, backpressure) and batch pipelines at scale.
Practical knowledge of feature-store concepts (offline/online stores, backfills, point-in-time correctness), model registries, experiment tracking, and evaluation frameworks.
Strong problem-solving skills and a proactive attitude toward ownership and platform health.
Excellent communication and collaboration skills, especially in cross-functional settings.

Bonus Points

Databricks experience (MLflow, Model Serving).
Experience with feature stores (e.g., Tecton, Feast) and streaming (Kafka/Kinesis).
Experience with fintech, risk, or underwriting systems; familiarity with model safety checks, rejection/override flows, and auditability.
Background with A/B testing platforms, shadow/canary deployments, and automated rollback.
Experience with low-latency inference systems.

What We Offer

Salary Range: $230k - $265k
Equity grant
Medical, dental & vision insurance
Work from home flexibility
Unlimited PTO
Commuter benefits
Free lunches
Paid parental leave
401(k)
Employee assistance program

Skills

PythonSQLPySparkSparkAWSDatabricksAirflowMLflowKubernetesKafkaTectonFeast

Similar roles at this salary range

All ML Engineering jobs →

Plaid

Jun 18

Machine Learning Engineer - Embedded Insights

Drive ML initiatives from concept to production on the Embedded Insights team. Identify opportunities, build and deploy models using Plaid's financial datasets, and partner with product teams to deliver scalable customer-facing intelligence products.

212k – 272kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLMLOps

Plaid

Jun 18

Machine Learning Engineer

Advance Plaid’s foundation models by developing novel architectures, pretraining objectives, and fine-tuning strategies. Work across the full ML stack from data engineering to production serving and monitoring.

212k – 272kSan Francisco, CA +2ML EngineeringHybrid1+ YOELLMsPython

Airbnb

Jun 18

Senior Machine Learning Engineer

Build and deploy cutting-edge Agentic AI and LLM systems to transform Airbnb's customer service experience, including Chat and Voice AI assistants. Requires 6+ years experience with production ML/AI systems at scale.

196k – 227kUnited StatesML EngineeringRemote6+ YOELLMSFT

Decagon

Jun 18

Staff Software Engineer, Agents

Build and own end-to-end AI agents for enterprise customers, integrating latest text/voice models and iterating based on real-world usage. Requires 8+ years of software engineering experience with Python and TypeScript.

200k – 400kSan Francisco, CAML EngineeringOn-site8+ YOEPythonAI Agents

Sesame

Jun 17

ML Engineer

Research Engineer building and deploying production voice and multimodal ML models. Requires expert PyTorch, large-scale model training experience, and shipping user-facing ML systems.

190k – 320kSan Francisco, CA +2ML EngineeringOn-site5+ YOEPythonPyTorch

Apply