Skip to content

Staff Engineer, Engineering Productivity & AI Quality

253k – 308kSan Francisco, CAOnsite8+ YOE
Summary

As a Staff Engineer, you will build and scale engineering productivity and AI quality systems, focusing on CI/CD gates, integration test harnesses, and agent instructions. This role is critical for enabling a small engineering team to operate with high leverage by encoding architectural taste into mechanical rules.

About the role

The Role

Harper operates like a factory with a series of modules spanning the full lifecycle from intake through renewals. Across them we run a stack of internal AI systems covering operator guidance, the operational backbone that matches risks to underwriters, autonomous communications, and voice AI for customer interactions.

You own the rails underneath the factory - the CI gates, integration test harnesses, agent instructions, PR preflight, architecture linting, dev environment reliability, and dead-code cleanup that the entire engineering team builds against. Three sub-disciplines live under this function:

  • Harness Engineering - the meta-harness on top of our frontier coding agents, OpenClaw, Hermes, and our internal agents
  • Developer Experience - CI/CD gates, build caching, merge queues, dev/staging/CI parity, internal developer platform, eval framework infrastructure
  • AI Quality - eval suite design, golden datasets, LLM-as-judge graders, production trajectory monitoring, drift detection, anti-slop guardrails

What You'll Own

  • CI/CD quality gates across Harper's most critical services - Define the minimum bar before code can merge
  • Integration test harnesses anchored to real failure modes - Every repeated operational failure becomes a regression test, a validation, or an architecture rule
  • The agent harness substrate - Sandbox lifecycle, tool routing, prompt/context layer, model-provider abstraction, multi-agent coordination
  • Repo-level agent instructions and context hygiene - AGENTS.md per repo, canonical data model docs, banned patterns. The information environment our coding agents read.
  • Automated PR preflight - Service impact summary, tests run, missing tests, model/migration changes, critical-path warnings. The robot that reviews every PR before a human does.
  • Architecture-rule enforcement - Custom lints and structural tests that encode the CTO's taste mechanically. Once a rule is written down, it never has to be argued in PR comments again.
  • Eval framework infrastructure - Pre-merge eval gating, experiment runs against curated datasets, production trajectory monitoring. All three wired together.
  • Engineering metrics that matter - Rework rate, escaped defects, flaky test count, deploy rollbacks, time-to-confident-ship, AI-generated PR quality. Anti-vanity. Anti-LOC.

You Might Be a Fit If…

  • You've built or scaled developer productivity, platform, build/test, CI/CD, or internal tooling systems at a high-growth startup or AI-infrastructure company
  • You can write and review production code at a Staff level - this is not a process or PM role
  • You have strong opinions about maintainability, architecture, testability, and developer experience - and you back them up with mechanical enforcement, not lectures
  • You're excited by AI coding agents but skeptical enough to build the guardrails they need
  • You can describe a specific lint rule, integration test, or eval-harness pattern you built that prevented a class of bugs from reaching production again
  • You write code with AI daily and routinely manage 3+ parallel coding sessions
  • You like creating leverage for other engineers more than owning a single product surface
  • You're 8–12 years into your career, with 3+ years at the Senior+ level

Requirements

  • 8+ years software engineering experience, including senior+ scope at a high-growth company
  • Track record of building developer productivity, platform, CI/CD, build systems, test infrastructure, or internal tooling that other engineers actually adopted
  • Production AI/ML systems experience - agent harness, eval frameworks, LLM-as-judge graders, prompt/context engineering - even if not your primary stack
  • Strong written communication - RFCs, architecture-rule docs, lint-rule rationale, internal playbooks
  • Based in San Francisco or willing to relocate

Nice to Have

  • Built or contributed to eval-framework infrastructure (open-source or internal)
  • Built developer platforms at an AI-native or high-growth company
  • Custom lint-rule / structural-test authoring at scale
  • Built or operated agent harnesses (sandboxing, isolation, agent execution environments)
  • Worked alongside a CTO whose architectural taste needed to be encoded into mechanical rules

Compensation

  • OTE: $253,000–$308,000 cash compensation (base salary + target performance bonus)
  • Equity: competitive equity, so you share in the company you are helping build
  • Location: San Francisco, in-office

Benefits

  • Health, dental, and vision insurance
  • Commuter benefits
  • Team meals and snacks
Skills
CI/CDIntegration TestingAI/ML SystemsDeveloper ProductivityPlatform EngineeringBuild SystemsTest InfrastructureInternal ToolingPrompt EngineeringLLM-as-judge graders
Similar roles at this salary range
All DevOps / SRE jobs →
Stuut

Lead Voice Infrastructure Engineer

Lead the design and operation of scalable telephony infrastructure powering AI voice agents for accounts receivable workflows, including SIP trunking, call routing, realtime media, and integrations with speech systems.

250k – 290kSan Francisco, CA +1DevOps / SREOn-site7+ YOECGo
Redpanda Data

Staff Production Operations Engineer

Staff-level role driving Redpanda's reliability operations program. Combines hands-on SRE with coordination of on-call, incident reviews, and AI-driven automation to improve global production reliability.

211k – 256kUnited StatesDevOps / SRERemote5+ YOEGoAWS
Airbnb

Staff Software Engineer (Technical Lead), Storage

Staff-level infrastructure engineer leading teams that build and operate Airbnb's critical KV stores, caching layers, coordination services, and data ingestion pipelines at massive scale.

204k – 255kUnited StatesDevOps / SRERemote9+ YOECDCRedis
Zocdoc

Staff Platform Engineer, AI Enablement

Staff Platform Engineer building and evolving Zocdoc's internal developer platform, CI/CD pipelines, and AI-assisted tooling to improve developer productivity, safety, and experience across engineering teams.

210k – 270kNew York, NYDevOps / SRERemote7+ YOEAWSCI/CD
Abridge

Senior / Staff Software Engineer, Agentic Engineering

Build and own CI/CD systems, agentic AI tooling, and developer platforms that power engineering velocity at a fast-growing healthcare AI company. Requires strong experience with modern build systems, Kubernetes, and AI-assisted development workflows.

230k – 290kSan Francisco, CADevOps / SREHybrid5+ YOEAWSGCP