Responsibilities

Provide technical leadership across multiple teams, driving architecture and strategy for deploying optimized NLP models to production in low latency, high throughput, high availability environments.
Serve as key point of contact for customers, leading design of customized deployments.
Mentor engineers to raise technical bar.

Requirements

8+ years engineering experience running production infrastructure at large scale, with technical leadership track record.
Experience leading architecture/design of large, highly available distributed systems with Kubernetes and GPU workloads.
Deep expertise with Kubernetes dev/production coding/support, setting team-wide standards.
Extensive experience across GCP, Azure, AWS, OCI, multi-cloud/on-prem/hybrid environments.
Lead design, deployment, support, troubleshooting of complex Linux-based computing environments at scale.
Own compute/storage/network resource and cost management at organizational level.
Expertise in computational characteristics of accelerators (GPUs, TPUs, custom), leveraging for latency/throughput improvements.
Deep knowledge of distributed systems, establishing patterns/practices.
Proficiency in Golang, C++ or similar for high-performance scalable servers.

Exceptional collaboration, communication, mentoring, cross-functional leadership.
Grit and adaptability for complex technical challenges.