Engineering Manager, Runtime Fabric
Lead the Runtime Fabrics team building container runtime and storage layers purpose-built for AI inference workloads. Manage systems engineers, set technical direction for containerd/runc extensions, and drive open-source contributions.
Responsibilities
Team Leadership & Culture
- Recruit, hire, and develop a high-performing team of systems engineers with deep container and Linux expertise
- Foster a culture of technical rigor, open-source contribution, and continuous improvement
- Provide regular coaching, feedback, and career development support to direct reports
- Partner with engineering leadership to define the long-term vision and roadmap for container runtime and storage infrastructure
Technical Direction
- Guide the team in extending and hardening containerd, runc, and related OCI ecosystem projects to meet the GPU-specific requirements of production AI inference, including startup performance, GPU device access, and multi-tenant isolation
- Oversee the architecture and evolution of the Baseten Delivery Network: the tiered caching and weight delivery system that makes cold starts 2–3x faster and eliminates thundering herd failures during burst scaling events
- Drive the expansion of BDN's architecture, currently focused on model weights, to container images, training checkpoints, and deployment artifacts
- Provide technical oversight on GPU-aware isolation mechanisms for multi-tenant inference, including secure container runtimes, Linux namespace hardening, and longer-term micro-VM integration
- Ensure the team maintains end-to-end ownership of the container startup performance path, from snapshotter initialization through weight delivery to first inference request
- Champion the team's contributions back to the open-source containerd ecosystem alongside a team of core maintainers
Cross-Functional Partnership
- Act as the primary advocate for Runtime Fabrics across the organization, ensuring upstream and downstream teams have the integration support they need
- Collaborate with product and engineering stakeholders to prioritize investments based on business impact and infrastructure reliability
- Communicate team progress, technical trade-offs, and architectural decisions clearly to leadership
Requirements
- Proven experience managing and growing engineering teams in a systems, infrastructure, or low-level runtime context
- Deep familiarity with the Linux container ecosystem: containerd, runc, OCI Runtime Spec, Linux namespaces, and cgroups, with the ability to engage credibly in code reviews and architectural discussions
- Contributions to containerd/containerd, opencontainers/runc, google/gvisor, kata-containers/kata-containers, or closely related open-source projects
- Strong systems programming background in Go and/or C/C++
- Experience with distributed storage systems, content-addressable storage, or large-scale caching infrastructure
- Understanding of how container images are structured, stored, and delivered at scale
- Strong written and verbal communication skills, with the ability to influence without authority across teams
Nice to Have
- Experience with GPU device access in containers: NVIDIA Container Toolkit, CDI (Container Device Interface), or GPU-aware scheduling
- Familiarity with lazy-loading snapshotters (stargz, soci, EROFS/Nydus) or peer-to-peer image distribution
- Experience with secure container runtimes (gVisor, Sysbox) or micro-VM technologies (Firecracker, Cloud Hypervisor)
- Understanding of containerd's shim API (v2) and experience building custom shim implementations
- Background in multi-tenant infrastructure or security-sensitive serving environments
Benefits
- Competitive compensation, including meaningful equity
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Flexible PTO policy including company wide Winter Break
- Paid parental leave
- Fertility and family-building stipend through Carrot
- Company-facilitated 401(k)
Engineering Manager
Engineering Manager responsible for leading a product team, owning technical direction, supervising work, and driving results from complex requirements. Requires 5-8 years experience and advanced domain programming skills.
Sr Engineering Manager, Web Apps
Lead day-to-day engineering execution and people management for the Web Applications team building MZLA's hosted subscription products. Requires 15+ years experience with 10+ years engineering leadership and 5+ years people management.
Sr Engineering Manager, Web Apps
Lead day-to-day engineering execution and people management for the Web Applications team building MZLA's hosted subscription products. Requires 15+ years experience with 10+ years in engineering leadership and 5+ years people management.
Engineering Manager
Lead a team of backend engineers building and scaling Bestow's communications infrastructure including document generation and multi-channel notifications. Player/coach role spending ~20% of time on production code.
Engineering Manager, Product Platform
Lead a senior autonomous engineering squad building and scaling Virta's core Product Platform services including communications, workflow engine, and AI platform. Drive platform adoption, SDLC innovation with AI-native tools, and technical strategy while coaching engineers toward Staff/Principal levels.