Platform Engineer - AI Control Plane
San Francisco, CADevOps / SREOnsite
Summary
As a Platform Engineer (SRE) focusing on the AI Control Plane, you will identify architectural changes, foster a culture of reliability, design operational processes, participate in on-call rotations, build monitoring systems, and debug production issues.
About the role
About the AI control plane
Our platform enables enterprises and fast-moving startups to get the most out of Claude, Codex, Cursor and other providers by:
- Leveraging enterprise identity systems to secure access to MCP servers
- Securing agent sessions to ensure sensitive data and enterprise policies are respected.
- Providing easy to use primitives to build new mcps, skills and assistants (secure claw)
- Deeply understand AI usage within an organisation from tool use, token spend, and complete agent sessions.
About the Role
This is a unique, high-impact opportunity to join a passionate team about an intuitive craft to enable solving green field and hard product problems. You’ll be working on an early product with a fast moving team that’s done this before and collaborating with founders (and customers) directly on a daily basis. Some of the ways you will have impact:
- Identify architectural changes to improve reliability, performance and availability.
- Foster a culture of reliability across Speakeasy's engineering organization.
- Design and implement key operational processes such as deployments, upgrades, rollbacks, and postmortem review.
- Join a core engineering team and participate in on-call rotation, responding to production incidents.
- Build monitoring systems that ensure the highest quality service for our customers.
- Debug production issues across all services and levels of the stack.
You’re a good fit if...
- You want to join a talent-dense team made up of ex-founders and domain experts across various developer tools, languages, and infrastructure (in fact over a quarter of the team).
- Ownership excites you in the full lifecycle of development: building, shipping, running support, maintaining infrastructure and measuring impact.
- You have a record of full agency in improving a product's reliability and uptime.
- You have an exceptional ability to learn, pickup new frameworks, and can work across the stack from backend to frontend.
- Ability to participate in on-call rotation and respond to production incidents.
- Ability to work in person in our San Francisco Office
Skills
AICloud PlatformsSystem ArchitectureMonitoringDebuggingBackend DevelopmentFrontend Development