Skip to content

Platform Engineer - AI Control Plane

San Francisco, CADevOps / SREOnsite
Summary

As a Platform Engineer (SRE) focusing on the AI Control Plane, you will identify architectural changes, foster a culture of reliability, design operational processes, participate in on-call rotations, build monitoring systems, and debug production issues.

About the role

About the AI control plane

Our platform enables enterprises and fast-moving startups to get the most out of Claude, Codex, Cursor and other providers by:

  • Leveraging enterprise identity systems to secure access to MCP servers
  • Securing agent sessions to ensure sensitive data and enterprise policies are respected.
  • Providing easy to use primitives to build new mcps, skills and assistants (secure claw)
  • Deeply understand AI usage within an organisation from tool use, token spend, and complete agent sessions.

About the Role

This is a unique, high-impact opportunity to join a passionate team about an intuitive craft to enable solving green field and hard product problems. You’ll be working on an early product with a fast moving team that’s done this before and collaborating with founders (and customers) directly on a daily basis. Some of the ways you will have impact:

  • Identify architectural changes to improve reliability, performance and availability.
  • Foster a culture of reliability across Speakeasy's engineering organization.
  • Design and implement key operational processes such as deployments, upgrades, rollbacks, and postmortem review.
  • Join a core engineering team and participate in on-call rotation, responding to production incidents.
  • Build monitoring systems that ensure the highest quality service for our customers.
  • Debug production issues across all services and levels of the stack.

You’re a good fit if...

  • You want to join a talent-dense team made up of ex-founders and domain experts across various developer tools, languages, and infrastructure (in fact over a quarter of the team).
  • Ownership excites you in the full lifecycle of development: building, shipping, running support, maintaining infrastructure and measuring impact.
  • You have a record of full agency in improving a product's reliability and uptime.
  • You have an exceptional ability to learn, pickup new frameworks, and can work across the stack from backend to frontend.
  • Ability to participate in on-call rotation and respond to production incidents.
  • Ability to work in person in our San Francisco Office
Skills
AICloud PlatformsSystem ArchitectureMonitoringDebuggingBackend DevelopmentFrontend Development