Skip to content

Lead Member of Technical Staff, Inference Infrastructure

San Francisco, CANew York, NYDevOps / SRERemote8+ YOE
Summary

Leads architecture and strategy for deploying optimized NLP models in high-throughput, low-latency production environments using Kubernetes and cloud platforms. Mentors engineers and designs custom customer deployments with 8+ years infrastructure experience.

About the role

Responsibilities

  • Provide technical leadership across multiple teams, driving architecture and strategy for deploying optimized NLP models to production in low latency, high throughput, high availability environments.
  • Serve as key point of contact for customers, leading design of customized deployments.
  • Mentor engineers to raise technical bar.

Requirements

  • 8+ years engineering experience running production infrastructure at large scale, with technical leadership track record.
  • Experience leading architecture/design of large, highly available distributed systems with Kubernetes and GPU workloads.
  • Deep expertise with Kubernetes dev/production coding/support, setting team-wide standards.
  • Extensive experience across GCP, Azure, AWS, OCI, multi-cloud/on-prem/hybrid environments.
  • Lead design, deployment, support, troubleshooting of complex Linux-based computing environments at scale.
  • Own compute/storage/network resource and cost management at organizational level.
  • Expertise in computational characteristics of accelerators (GPUs, TPUs, custom), leveraging for latency/throughput improvements.
  • Deep knowledge of distributed systems, establishing patterns/practices.
  • Proficiency in Golang, C++ or similar for high-performance scalable servers.

Nice-to-Haves

  • Exceptional collaboration, communication, mentoring, cross-functional leadership.
  • Grit and adaptability for complex technical challenges.
Skills
KubernetesGCPAzureAWSOCIGolangC++LinuxGPUsTPUsDistributed Systems