Staff Engineer, Engineering Productivity & AI Quality

253k – 308kSan Francisco, CAOnsite8+ YOEJun 1

Summary

As a Staff Engineer, you will build and scale engineering productivity and AI quality systems, focusing on CI/CD gates, integration test harnesses, and agent instructions. This role is critical for enabling a small engineering team to operate with high leverage by encoding architectural taste into mechanical rules.

About the role

The Role

Harper operates like a factory with a series of modules spanning the full lifecycle from intake through renewals. Across them we run a stack of internal AI systems covering operator guidance, the operational backbone that matches risks to underwriters, autonomous communications, and voice AI for customer interactions.

You own the rails underneath the factory - the CI gates, integration test harnesses, agent instructions, PR preflight, architecture linting, dev environment reliability, and dead-code cleanup that the entire engineering team builds against. Three sub-disciplines live under this function:

Harness Engineering - the meta-harness on top of our frontier coding agents, OpenClaw, Hermes, and our internal agents
Developer Experience - CI/CD gates, build caching, merge queues, dev/staging/CI parity, internal developer platform, eval framework infrastructure
AI Quality - eval suite design, golden datasets, LLM-as-judge graders, production trajectory monitoring, drift detection, anti-slop guardrails

What You'll Own

CI/CD quality gates across Harper's most critical services - Define the minimum bar before code can merge
Integration test harnesses anchored to real failure modes - Every repeated operational failure becomes a regression test, a validation, or an architecture rule
The agent harness substrate - Sandbox lifecycle, tool routing, prompt/context layer, model-provider abstraction, multi-agent coordination
Repo-level agent instructions and context hygiene - AGENTS.md per repo, canonical data model docs, banned patterns. The information environment our coding agents read.
Automated PR preflight - Service impact summary, tests run, missing tests, model/migration changes, critical-path warnings. The robot that reviews every PR before a human does.
Architecture-rule enforcement - Custom lints and structural tests that encode the CTO's taste mechanically. Once a rule is written down, it never has to be argued in PR comments again.
Eval framework infrastructure - Pre-merge eval gating, experiment runs against curated datasets, production trajectory monitoring. All three wired together.
Engineering metrics that matter - Rework rate, escaped defects, flaky test count, deploy rollbacks, time-to-confident-ship, AI-generated PR quality. Anti-vanity. Anti-LOC.

You Might Be a Fit If…

You've built or scaled developer productivity, platform, build/test, CI/CD, or internal tooling systems at a high-growth startup or AI-infrastructure company
You can write and review production code at a Staff level - this is not a process or PM role
You have strong opinions about maintainability, architecture, testability, and developer experience - and you back them up with mechanical enforcement, not lectures
You're excited by AI coding agents but skeptical enough to build the guardrails they need
You can describe a specific lint rule, integration test, or eval-harness pattern you built that prevented a class of bugs from reaching production again
You write code with AI daily and routinely manage 3+ parallel coding sessions
You like creating leverage for other engineers more than owning a single product surface
You're 8–12 years into your career, with 3+ years at the Senior+ level

Requirements

8+ years software engineering experience, including senior+ scope at a high-growth company
Track record of building developer productivity, platform, CI/CD, build systems, test infrastructure, or internal tooling that other engineers actually adopted
Production AI/ML systems experience - agent harness, eval frameworks, LLM-as-judge graders, prompt/context engineering - even if not your primary stack
Strong written communication - RFCs, architecture-rule docs, lint-rule rationale, internal playbooks
Based in San Francisco or willing to relocate

Nice to Have

Built or contributed to eval-framework infrastructure (open-source or internal)
Built developer platforms at an AI-native or high-growth company
Custom lint-rule / structural-test authoring at scale
Built or operated agent harnesses (sandboxing, isolation, agent execution environments)
Worked alongside a CTO whose architectural taste needed to be encoded into mechanical rules

Compensation

OTE: $253,000–$308,000 cash compensation (base salary + target performance bonus)
Equity: competitive equity, so you share in the company you are helping build
Location: San Francisco, in-office

Benefits

Health, dental, and vision insurance
Commuter benefits
Team meals and snacks

Skills

CI/CDIntegration TestingAI/ML SystemsDeveloper ProductivityPlatform EngineeringBuild SystemsTest InfrastructureInternal ToolingPrompt EngineeringLLM-as-judge graders

Similar roles at this salary range

All DevOps / SRE jobs →

Stuut

Jun 17

Lead Voice Infrastructure Engineer

Lead the design and operation of scalable telephony infrastructure powering AI voice agents for accounts receivable workflows, including SIP trunking, call routing, realtime media, and integrations with speech systems.

250k – 290kSan Francisco, CA +1DevOps / SREOn-site7+ YOECGo

Redpanda Data

Jun 15

Staff Production Operations Engineer

Staff-level role driving Redpanda's reliability operations program. Combines hands-on SRE with coordination of on-call, incident reviews, and AI-driven automation to improve global production reliability.

211k – 256kUnited StatesDevOps / SRERemote5+ YOEGoAWS

Airbnb

Jun 15

Staff Software Engineer (Technical Lead), Storage

Staff-level infrastructure engineer leading teams that build and operate Airbnb's critical KV stores, caching layers, coordination services, and data ingestion pipelines at massive scale.

204k – 255kUnited StatesDevOps / SRERemote9+ YOECDCRedis

Zocdoc

Jun 15

Staff Platform Engineer, AI Enablement

Staff Platform Engineer building and evolving Zocdoc's internal developer platform, CI/CD pipelines, and AI-assisted tooling to improve developer productivity, safety, and experience across engineering teams.

210k – 270kNew York, NYDevOps / SRERemote7+ YOEAWSCI/CD

Abridge

Jun 13

Senior / Staff Software Engineer, Agentic Engineering

Build and own CI/CD systems, agentic AI tooling, and developer platforms that power engineering velocity at a fast-growing healthcare AI company. Requires strong experience with modern build systems, Kubernetes, and AI-assisted development workflows.

230k – 290kSan Francisco, CADevOps / SREHybrid5+ YOEAWSGCP

Apply