Staff Software Engineer, Machine Learning
Boston, MAML EngineeringHybrid7+ YOE
Summary
Staff-level engineer building and operating production ML inference systems, data pipelines, and backend services that deliver personalized health insights from physiological data at scale.
About the role
Responsibilities
- Design, build, and maintain production services that deliver health features, in close collaboration with Applied ML Scientists and ML Research Engineers.
- Lead the architecture and development of scalable ML inference systems, APIs, and backend services optimized for reliability, latency, and cost efficiency.
- Collaborate with Data Platform teams to improve ML data pipelines, tooling, feature delivery systems, and validation frameworks that support robust model performance.
- Work alongside Applied ML Scientists to translate research prototypes into production systems that can be deployed, monitored, and operated at scale.
- Partner with the Digital Health team on algorithmic performance specifications, validation and verification planning, and the design of SPA or algorithm validation studies.
- Drive operational excellence through monitoring, observability, incident response, and reliability improvements for ML-powered services.
- Collaborate with researchers, product teams, and engineering stakeholders to align platform investments with health insights and member impact.
- Participate in on-call rotations for ML and data services, ensuring uptime, performance, and reliability in production environments.
- Provide technical leadership through architecture reviews, engineering standards, mentorship, and cross-functional collaboration.
Requirements
- Bachelor's degree in Computer Science, Software Engineering, Data Science, Applied Mathematics, or a related field (Master's preferred).
- 7+ years of professional experience as a Software Engineer, Machine Learning Engineer, Platform Engineer, or related role building large-scale distributed systems and/or production ML platforms.
- Strong coding skills in Python with a track record of writing clean, well-tested, production-quality code.
- Strong fundamentals in backend and service development, including APIs, distributed systems, reliability, monitoring, debugging, and performance optimization.
- Experience designing, deploying, and operating ML inference systems at scale (real-time streaming and/or large-scale batch).
- Experience building and maintaining distributed systems, event-driven architectures, or high-throughput data processing platforms.
- Experience deploying and operating services on cloud platforms (AWS or GCP), including Kubernetes, CI/CD pipelines, infrastructure automation, and observability tooling.
- Experience partnering with data science or machine learning teams to productionize models, algorithms, and data-driven features.
- Familiarity with applied machine learning concepts, model evaluation, experimentation, and performance validation.
- Demonstrated technical leadership through architecture and design ownership, setting engineering standards, and raising quality through reviews and mentorship.
- Proven track record driving measurable improvements in system performance, reliability, scalability, and/or cost at scale, while influencing cross-functional technical direction.
Nice-to-Haves
- Experience processing time-series, streaming, sensor, wearable, physiological, or other high-volume data sources.
- Experience developing software in a regulated or quality-managed environment (SaMD, medical device, healthcare, fintech, or similarly regulated domains).
Skills
PythonMachine LearningDistributed SystemsML InferenceAPIsKubernetesAWSGCPCI/CDObservability