Senior Machine Learning Operations Engineer

256k – 285kNew York City, NYHybrid5+ YOEJun 16

Summary

Build and operate production ML systems and platform components for healthcare technology, partnering with ML and data science teams on model deployment, observability, and reliability.

About the role

What you will do

Help ensure the reliability, performance, functionality, and cost-efficiency of Garner's production ML systems, contributing to SLOs, observability, and on-call responsibilities.
Build key components of Garner's ML platform, including data infrastructure (such as a feature store, model registry, and CI/CD for models) and standardized service patterns.
Implement ML-specific CI/CD pipelines: Help transition our deployment process from manual notebook hand-offs to automated, PR-driven CI/CD workflows that include automated data quality checks and statistical model validation prior to deployment.
Drive down cost and latency through improved architecture, hardware choices, and model optimization as appropriate.
Contribute to the workflows, standards, and KPIs that support a growing MLOps function, helping teammates and stakeholders quickly identify the health of the team's products and focus on areas where issues reside.
Help establish drift monitoring: Design and implement automated data drift and concept drift monitoring systems that alert the team when models degrade, laying the groundwork for future Continuous Training (CT) architectures.

The ideal candidate has

5+ years of software engineering experience, with meaningful time spent operating ML or data-intensive systems in production.
Hands-on experience with the modern ML production stack: model serving (e.g., Sagemaker, Triton, or equivalent), feature stores, model registries, and CI/CD for ML.
Strong infrastructure and platform engineering fundamentals: Kubernetes, containerization, cloud (AWS preferred), Terraform/IaC, observability, and incident response.
Experience building ML platforms or significant components of one (not strictly consuming SaaS), with sound judgment around when to build vs. buy.
Strong collaboration with ML, data, platform engineers, data scientists, and product engineering teams, with the ability to lead projects and influence technical decisions.
Healthcare, regulated-data, or other high-stakes production ML experience is a plus but not required.
A desire to be a part of a high-performing, mission-driven team that operates with intense urgency, a strong sense of individual accountability, and a commitment to authentic feedback.

Technologies we use

Python, Kubernetes, AWS, Sagemaker, Terraform, S3, Snowflake, Airflow, Datadog

Skills

PythonKubernetesAWSSageMakerTerraformS3SnowflakeAirflowDatadogCI/CDFeature StoresModel RegistriesModel Serving

Similar roles at this salary range

All ML Engineering jobs →

Plaid

Jun 18

Machine Learning Engineer - Embedded Insights

Drive ML initiatives from concept to production on the Embedded Insights team. Identify opportunities, build and deploy models using Plaid's financial datasets, and partner with product teams to deliver scalable customer-facing intelligence products.

212k – 272kSan Francisco, CA +2ML EngineeringHybrid5+ YOESQLMLOps

Plaid

Jun 18

Machine Learning Engineer

Advance Plaid’s foundation models by developing novel architectures, pretraining objectives, and fine-tuning strategies. Work across the full ML stack from data engineering to production serving and monitoring.

212k – 272kSan Francisco, CA +2ML EngineeringHybrid1+ YOELLMsPython

Jun 17

Staff Machine Learning Engineer, Notifications Relevance

Technical leader for Reddit's Notifications Relevance ML systems, driving large-scale recommendation systems spanning retrieval, ranking, budget optimization, and LLM-powered experiences.

230k – 322kUnited StatesML EngineeringRemote8+ YOEPythonGolang

Stuut

Jun 17

Member of Technical Staff — Audio and Voice AI

Design, build, and deploy production-grade voice and audio AI systems including real-time agents and speech-driven workflows for financial operations. Requires 5+ years engineering experience with focus on applied AI/ML or speech systems.

220k – 320kSan Francisco, CA +1ML EngineeringOn-site5+ YOEMLOpsPython

Zoox

Jun 16

Senior/Staff Software Engineer - Planner Frameworks Pipeline

Build and optimize large-scale simulation and ML training pipelines on Ray and Kubernetes to validate autonomous vehicle behavior. Requires 8+ years experience and strong distributed systems background.

219k – 315kFoster City, CA +1ML EngineeringHybrid8+ YOEC++AWS

Apply