Responsibilities

Design, build, and maintain production services that deliver health features, in close collaboration with Applied ML Scientists and ML Research Engineers.
Lead the architecture and development of scalable ML inference systems, APIs, and backend services optimized for reliability, latency, and cost efficiency.
Collaborate with Data Platform teams to improve ML data pipelines, tooling, feature delivery systems, and validation frameworks that support robust model performance.
Work alongside Applied ML Scientists to translate research prototypes into production systems that can be deployed, monitored, and operated at scale.
Partner with the Digital Health team on algorithmic performance specifications, validation and verification planning, and the design of SPA or algorithm validation studies.
Drive operational excellence through monitoring, observability, incident response, and reliability improvements for ML-powered services.
Collaborate with researchers, product teams, and engineering stakeholders to align platform investments with health insights and member impact.
Participate in on-call rotations for ML and data services, ensuring uptime, performance, and reliability in production environments.
Provide technical leadership through architecture reviews, engineering standards, mentorship, and cross-functional collaboration.

Requirements

Bachelor's degree in Computer Science, Software Engineering, Data Science, Applied Mathematics, or a related field (Master's preferred).
7+ years of professional experience as a Software Engineer, Machine Learning Engineer, Platform Engineer, or related role building large-scale distributed systems and/or production ML platforms.
Strong coding skills in Python with a track record of writing clean, well-tested, production-quality code.
Strong fundamentals in backend and service development, including APIs, distributed systems, reliability, monitoring, debugging, and performance optimization.
Experience designing, deploying, and operating ML inference systems at scale (real-time streaming and/or large-scale batch).
Experience building and maintaining distributed systems, event-driven architectures, or high-throughput data processing platforms.
Experience deploying and operating services on cloud platforms (AWS or GCP), including Kubernetes, CI/CD pipelines, infrastructure automation, and observability tooling.
Experience partnering with data science or machine learning teams to productionize models, algorithms, and data-driven features.
Familiarity with applied machine learning concepts, model evaluation, experimentation, and performance validation.
Demonstrated technical leadership through architecture and design ownership, setting engineering standards, and raising quality through reviews and mentorship.
Proven track record driving measurable improvements in system performance, reliability, scalability, and/or cost at scale, while influencing cross-functional technical direction.

Nice-to-Haves

Experience processing time-series, streaming, sensor, wearable, physiological, or other high-volume data sources.
Experience developing software in a regulated or quality-managed environment (SaMD, medical device, healthcare, fintech, or similarly regulated domains).