Skip to content

Member of Technical Staff, Infrastructure & Scaling

Builds, operates, and scales infrastructure for web/AI products, focusing on reliability, cost-efficiency, and handling large language models. Requires expertise in distributed systems, cloud platforms, performance tuning, and scalable architectures.

150k – 300kPalo Alto, CASan Francisco, CADevOps / SREOnsite

About the role

Responsibilities

  • Build, operate, and scale infrastructure, including around large language models.
  • Ensure systems are reliable and cost-efficient as the company grows.
  • Anticipate bottlenecks and evolve architecture to meet increasing demands.
  • Build tools and systems to maintain high engineering velocity.

Requirements

  • Deep intuition on distributed systems, cloud platforms, performance tuning, and scalable architecture.
  • Ability to reason about trade-offs between cost, reliability, and speed of iteration.
  • Passion for enabling teams to build faster and ship confidently.
  • Infrastructure supporting products used by millions without issues.

Compensation & Benefits

  • Competitive salary
  • Generous equity
  • Visa sponsorships
  • 401K plans
  • Daily lunch & office snacks
  • Dinner at the office
  • Unlimited vacation
  • Caltrain pass reimbursement

Skills

Distributed SystemsCloud PlatformsKubernetesPerformance TuningScalable ArchitectureLLMsInfrastructure As CodeMonitoring ToolsAWSGCP

Similar roles

DevOps / SRE jobs

Staff Site Reliability Engineer

Founding Staff SRE for Kong's internal developer platform (Volcano). Define reliability posture, build multi-region Kubernetes infrastructure, establish GitOps/CI-CD, and scale managed data services.

150k – 210kUnited StatesDevOps / SRERemote7+ YOESREHelm

Staff Engineer, DevOps (4797)

Owns and maintains C++ build systems for autonomous aircraft software, improves developer velocity by optimizing CI/CD pipelines, integrates testing with simulations, and implements monitoring to resolve issues quickly. Requires 7+ years experience with deep expertise in build tools and DevOps practices.

150k – 220kSan Diego, CA +2DevOps / SREOn-site7+ YOEC++Cpm

Staff Platform Engineer

Designs, implements, and maintains scalable infrastructure using Kubernetes and Terraform. Architects GitOps pipelines, drives security initiatives, and mentors teams to enhance developer velocity and platform reliability. Requires 7+ years experience and bachelor's degree.

150k – 170kDenver, CODevOps / SREHybrid7+ YOEC#Iac

Staff Engineer, Software Integration (R4483)

Integrates autonomy software stack for AI robotics platforms, including multi-agent systems, sensor processing, and hardware deployment across simulation, HIL, and flight environments. Requires 7+ years experience, Python/C++, CI/CD expertise, and strong systems integration skills.

150k – 220kSan Diego, CA +1DevOps / SREOn-site7+ YOEC++ROS

Member of Technical Staff - Reliability Engineering

Define and implement reliability systems for a growing AI cloud infrastructure platform, including architectural improvements, operational processes, monitoring, and incident response. Requires 5+ years production coding and 2+ years on-call experience with strong cloud skills.

150k – 350kNew York, NY +1DevOps / SREOn-site5+ YOEAWSKubernetes