Skip to content

Principal GenAI Platform Engineer (US)

179k – 199kUnited StatesRemote
Summary

Build and maintain GenAI platform infrastructure including model gateways, vector DBs, observability, and secure access controls for production LLM workloads.

About the role

Key Responsibilities

  • Design, build, and maintain the core infrastructure layer supporting GenAI products, including model gateways, prompt/versioning stores, vector databases, and LLM evaluation tools.
  • Implement secure access controls and authentication mechanisms integrated by default into the AI platform components.
  • Develop and manage observability, monitoring, and logging solutions for GenAI workloads and infrastructure.
  • Collaborate closely with product and engineering teams to integrate GenAI infrastructure with agent frameworks, and downstream applications.
  • Optimize infrastructure for scalability, high availability, cost efficiency for production workloads.

Qualifications & Skills

  • Extensive experience building and maintaining AI platform infrastructure, Kubernetes, and container security.
  • Demonstrated expertise in observability, and monitoring frameworks, with a focus on real-time performance (e.g., experience with OpenTelemetry, MLFlow).
  • Experience with AI infrastructure components such as vector databases, prompt/versioning stores, and AI IDEs.

Preferred Experience

  • Familiarity with vLLM, SGLang or similar framework to host LLM inference workloads.
  • Experience with CI/CD pipelines and automation for AI model deployment and platform operations.
  • Strong knowledge of authentication and authorization frameworks integrated into AI platforms.
Skills
KubernetesOpenTelemetryMLflowVector DatabasesvLLMSGLangCI/CDAuthenticationAuthorizationContainer Security
Similar roles at this salary range
All DevOps / SRE jobs →
Crusoe

Staff Software Engineer, Developer Experience

Staff-level engineer building developer tools, infrastructure, and automation to accelerate Crusoe engineering productivity. Requires Go, Kubernetes, CI/CD, and strong DevOps/SRE experience.

209k – 253kSan Francisco, CA +1DevOps / SREOn-siteGoGit
Aurelian

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse
Aurelian

Staff Infrastructure Engineer

Build infrastructure, observability, and developer tooling for a realtime AI platform serving 911 centers. Requires 6+ years infrastructure/platform/backend experience and comfort across the full stack.

180k – 240kSeattle, WADevOps / SREOn-siteLoggingClickHouse
Stuut

Lead Site Reliability Engineer

Lead SRE driving reliability strategy, infrastructure architecture, observability, and incident response for a B2B fintech platform on AWS and Kubernetes. Requires 7+ years building production-grade distributed systems.

200k – 275kSan Francisco, CADevOps / SREOn-siteAWSEKS
Huntress

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby