Skip to content

Member of Technical Staff, AI Platform & Architecture (Infrastructure)

Builds and maintains distributed AI infrastructure for model training, inference, and data pipelines. Requires experience in GenAI systems, distributed computing, Python/Go, and scaling AI workloads on GPUs/cloud.

256k – 276kSan Francisco, CABoston, MANew York, NY+1 moreDevOps / SREHybrid

About the role

The Opportunity

As a Member of Technical Staff on AI Infrastructure, you will build and maintain the foundational systems and distributed infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering and research teams to ensure performance, scalability, and reliability of critical AI systems.

What You’ll Do

  • Design and implement large-scale, distributed AI infrastructure and services
  • Optimize performance for GPU/xPU accelerators and cloud environments
  • Build tools for observability, reliability, and scaling of AI workloads
  • Partner with cross-functional teams to define AI infrastructure requirements and roadmap
  • Contribute to architectural design and system longevity

About You

  • Have experience with GenAI infrastructure systems, distributed systems, cloud computing, and high-performance infrastructure
  • Are proficient in programming languages like Python, Go, or similar
  • Understand scaling challenges specific to AI workloads and accelerators
  • Thrive in fast-paced, collaborative engineering environments

Compensation: Base salary $256,000 - $276,000 plus equity.

Skills

PythonGoDistributed SystemsGenai InfrastructureGPUCloud ComputingKubernetesObservabilityAi WorkloadsHigh-Performance Computing

Similar roles

DevOps / SRE jobs

Member of Technical Staff, AI Reliability & Monitoring Engineering Lead

Lead AI reliability engineering for Postman's API and agentic systems, building monitoring, observability, and automation for high availability. Requires strong SRE/DevOps background in large-scale AI infrastructure and cloud platforms.

256k – 276kSan Francisco, CADevOps / SREHybridSRESLOs

Staff Engineer, Engineering Productivity & AI Quality

As a Staff Engineer, you will build and scale engineering productivity and AI quality systems, focusing on CI/CD gates, integration test harnesses, and agent instructions. This role is critical for enabling a small engineering team to operate with high leverage by encoding architectural taste into mechanical rules.

253k – 308kSan Francisco, CADevOps / SREOn-site8+ YOECI/CDAi/Ml Systems

Staff Site Reliability Engineer

Lead EarnIn's AI-first reliability engineering strategy. Define SLOs/SLIs, build AI agents for incident response and on-call automation, and partner with engineering teams to embed AI-assisted operations across production systems on AWS.

252k – 308kMountain View, CADevOps / SREHybrid7+ YOEGoSRE

Senior Staff Software Engineer, Infrastructure

Designs and implements large-scale public cloud infrastructure, builds complex distributed systems and microservices. Requires 10+ years experience, expert skills in performance tuning, concurrency, multiple cloud providers like AWS/GCP/Azure, and graduate degree or equivalent.

260k – 325kUnited StatesDevOps / SRERemote10+ YOEGoAWS

Member of Technical Staff

Hands-on technical role building AI-powered tools, infrastructure, and processes to accelerate engineering velocity and product delivery at an AI search company.

250k – 405kSan Francisco, CA +1DevOps / SREHybrid5+ YOEGoRust