Lead Member of Technical Staff, Inference Infrastructure
San Francisco, CANew York, NYDevOps / SRERemote8+ YOE
Summary
Leads architecture and strategy for deploying optimized NLP models in high-throughput, low-latency production environments using Kubernetes and cloud platforms. Mentors engineers and designs custom customer deployments with 8+ years infrastructure experience.
About the role
Responsibilities
- Provide technical leadership across multiple teams, driving architecture and strategy for deploying optimized NLP models to production in low latency, high throughput, high availability environments.
- Serve as key point of contact for customers, leading design of customized deployments.
- Mentor engineers to raise technical bar.
Requirements
- 8+ years engineering experience running production infrastructure at large scale, with technical leadership track record.
- Experience leading architecture/design of large, highly available distributed systems with Kubernetes and GPU workloads.
- Deep expertise with Kubernetes dev/production coding/support, setting team-wide standards.
- Extensive experience across GCP, Azure, AWS, OCI, multi-cloud/on-prem/hybrid environments.
- Lead design, deployment, support, troubleshooting of complex Linux-based computing environments at scale.
- Own compute/storage/network resource and cost management at organizational level.
- Expertise in computational characteristics of accelerators (GPUs, TPUs, custom), leveraging for latency/throughput improvements.
- Deep knowledge of distributed systems, establishing patterns/practices.
- Proficiency in Golang, C++ or similar for high-performance scalable servers.
Nice-to-Haves
- Exceptional collaboration, communication, mentoring, cross-functional leadership.
- Grit and adaptability for complex technical challenges.
Skills
KubernetesGCPAzureAWSOCIGolangC++LinuxGPUsTPUsDistributed Systems