Software Engineer, ML Systems & Training Architecture
Hands-on senior software engineer focused on maintaining and improving ML training infrastructure, debugging training systems, and unblocking researchers on the robotics team.
Responsibilities
- Review, improve, and clean up code across training frameworks and adjacent infrastructure
- Identify risky or low-quality changes before they land, and raise the code quality bar without slowing the team down
- Debug issues across ML training systems, GPUs, clusters, networking, and related infrastructure
- Help researchers and engineers unblock broken training jobs, flaky workflows, and brittle internal tooling
- Improve the reliability, maintainability, and usability of the robotics team's training framework
- Move quickly on practical engineering problems that directly affect team velocity
Requirements
- Strong software engineering fundamentals and excellent code review judgment
- Experience with ML systems, training frameworks, GPUs, distributed systems, infrastructure, or similarly complex technical environments
- Ability to read and debug unfamiliar codebases quickly, and enjoy getting to root cause
- Ship high-quality code with strong velocity and pragmatic judgment
- Low-ego, responsive, and motivated by helping researchers and engineers move faster
- Prefer being a highly effective hands-on IC over driving broad process-heavy initiatives
- Experience reviewing messy, fast-moving, or AI-generated codebases
Senior Staff Machine Learning Engineer, Communication & Connectivity
Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.
Staff Software Engineer
Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.
Member of Technical Staff — Model Optimization and Inference
Optimize inference for real-time multimodal AI avatars. Specialize in LLM and diffusion model serving, KV cache strategies, quantization, and low-latency frameworks like vLLM and TensorRT-LLM.