Platform Engineer

Builds backend infrastructure and core platform for AI agent cloud, including VM hypervisors, LLM sandboxes, networking, and orchestration. Requires 5+ years in distributed systems and Linux administration for onsite role in San Francisco.

180k – 250kSan Francisco, CADevOps / SREOnsite5+ YOE

Apply

About the role

Responsibilities

Designing and building the E2B backend and core infrastructure
Working with VM hypervisors like Firecracker, gVisor, or Linux systems
Building and optimizing runtimes and sandboxes for LLMs
Developing networking solutions for secure, isolated environments
Monitoring resources and optimizing sandbox performance
Solving general infrastructure challenges at scale
Working with orchestration technologies like Kubernetes or Nomad
Collaborating closely with Distributed Systems Engineers

Requirements

5+ years building infrastructure, especially distributed systems
5+ years of Linux administration - knowledge of Linux fundamentals: bootloader, kernel, package management, networking, storage, namespaces, containers
Experience building and operating infrastructure at scale
Excited to work in person from San Francisco on a DevTool product
Detail-oriented with great taste in design and engineering
Comfortable working closely with users
Proactive, not afraid to take ownership of part of the product
Excited to take projects from 0 → 1 with the support of the team

Benefits

Full healthcare, vision, and dental insurance
Unlimited PTO

Skills

LinuxKubernetesFirecrackerGvisorNomadDistributed SystemsNamespacesContainersNetworkingSandboxes

Similar roles

DevOps / SRE jobs

Fluidstack

Network Engineer, Design & Engineering

Design end-to-end datacenter network architectures for AI training and inference workloads. Own topology selection, fabric design, physical infrastructure integration, and produce deployable HLDs/LLDs across multiple GPU platforms and customer requirements.

180k – 300kNew York, NY +4DevOps / SREOn-site5+ YOEBGPPfc

Hightouch

Developer Productivity Engineer

As a Senior Developer Productivity Engineer, you will own the build, test, and deployment processes for a 50+ person engineering team. You will improve monorepo productivity, drive excellence in testing, and support multi-cloud/multi-region infrastructure to enable fast and safe shipping.

180k – 320kUnited StatesDevOps / SRERemote5+ YOEGoCI/CD

Baseten

Data Center Network Engineer

Design and own high-performance data center network infrastructure for GPU clusters, including fabric architecture, cabling, and performance validation. Requires deep experience with InfiniBand, RDMA, or high-performance Ethernet at a senior level.

180k – 360kSan Francisco, CA +1DevOps / SREHybrid5+ YOERdmaEthernet

Lightning AI

Infrastructure Engineer (Observability)

Builds and operates scalable observability platforms for metrics, logs, traces across GPU, HPC infrastructure. Designs telemetry pipelines, alerting, and multi-tenant systems using Prometheus, Grafana, Kafka; requires 5+ years SRE/infra experience.

180k – 200kNew York, NY +2DevOps / SRERemote5+ YOEGoElk

Lightning AI

Infrastructure Engineer (GPU & Compute)

Owns GPU diagnostics, validation workflows, and automation for bare-metal infrastructure supporting AI/ML workloads. Requires 5+ years in systems engineering with strong Linux, Python, and NVIDIA tools expertise.

180k – 200kNew York, NY +2DevOps / SRERemote5+ YOEPxeIpmi