Senior Software Engineer, ML Platform
Build and maintain scalable ML platform for model experimentation, training, evaluation, inference, and feature store to power underwriting products. Requires 5+ years experience with Python, ML stacks like Databricks/AWS, and MLOps systems.
What You'll Do
- Turn notebooks into software. Decompose data scientist training/inference notebooks into reusable, tested components (libraries, pipelines, templates) with clear interfaces and documentation.
- Create developer-friendly ML abstractions. Build SDKs, CLIs, and templates that make it simple to define features, train/evaluate models, and deploy to batch or real-time targets with minimal boilerplate.
- Build our real-time ML inference platform. Stand up and scale low-latency model serving.
- Expand batch ML inference. Improve scheduling, parallelism, cost controls, observability, and failure/rollback for large-scale batch scoring and post-processing.
- Own and expand the feature store. Design offline/online feature definitions, high read/write throughput, and consistent offline/online semantics.
- Platform reliability and observability. Instrument training/inference for latency, throughput, accuracy, drift, data quality, and cost; build alerting and dashboards; drive incident response and postmortems.
- Underwriting infrastructure partnership. Support production batch and real-time underwriting systems in collaboration with Data Science; collaborate on model interfaces, SLAs, safety checks, and product integrations.
What We Are Looking For
- 5+ years of software engineering experience, including experience on ML platform/MLOps systems (training, deployment, and/or feature pipelines).
- Strong Python; solid software design and testing fundamentals. Proficiency with SQL; hands-on Spark/PySpark experience.
- Knowledge of ML fundamentals—probability & statistics, supervised vs. unsupervised learning, bias/variance & regularization, feature engineering, model evaluation metrics, validation strategies, and production concerns like drift, stability, and monitoring.
- Expertise with modern data/ML stacks—AWS, Databricks (workflows, lakehouse, MLflow/registry, Model Serving), and Airflow (or equivalent orchestration).
- Experience building real-time systems (service design, caching, rate limiting, backpressure) and batch pipelines at scale.
- Practical knowledge of feature-store concepts (offline/online stores, backfills, point-in-time correctness), model registries, experiment tracking, and evaluation frameworks.
- Strong problem-solving skills and a proactive attitude toward ownership and platform health.
- Excellent communication and collaboration skills, especially in cross-functional settings.
Bonus Points
- Databricks experience (MLflow, Model Serving).
- Experience with feature stores (e.g., Tecton, Feast) and streaming (Kafka/Kinesis).
- Experience with fintech, risk, or underwriting systems; familiarity with model safety checks, rejection/override flows, and auditability.
- Background with A/B testing platforms, shadow/canary deployments, and automated rollback.
- Experience with low-latency inference systems.
What We Offer
Salary Range: $230k - $265k
Equity grant
Medical, dental & vision insurance
Work from home flexibility
Unlimited PTO
Commuter benefits
Free lunches
Paid parental leave
401(k)
Employee assistance program
Machine Learning Engineer - Embedded Insights
Drive ML initiatives from concept to production on the Embedded Insights team. Identify opportunities, build and deploy models using Plaid's financial datasets, and partner with product teams to deliver scalable customer-facing intelligence products.
Machine Learning Engineer
Advance Plaid’s foundation models by developing novel architectures, pretraining objectives, and fine-tuning strategies. Work across the full ML stack from data engineering to production serving and monitoring.
Senior Machine Learning Engineer
Build and deploy cutting-edge Agentic AI and LLM systems to transform Airbnb's customer service experience, including Chat and Voice AI assistants. Requires 6+ years experience with production ML/AI systems at scale.
Staff Software Engineer, Agents
Build and own end-to-end AI agents for enterprise customers, integrating latest text/voice models and iterating based on real-world usage. Requires 8+ years of software engineering experience with Python and TypeScript.