# Software Engineer, Agentic Runtime
**Company:** [Glean](https://hotfix.jobs/companies/glean)
**Location:** Palo Alto, CA, San Francisco, CA
**Salary:** $170K-$265K
**Experience:** 3+ years
**Skills:** Python, Go, Java, C++, Kubernetes, GCP, AWS, Azure, Kafka, Redis, gRPC, WebSockets, OpenTelemetry, OpenAI, Anthropic
**Posted:** 2026-03-16
> Builds low-latency runtime services for AI agents, handling orchestration, tool calling, model routing, and observability. Requires 3+ years in distributed systems, strong coding in Python/Go/Java/C++, Kubernetes experience, and LLM familiarity.
## Job Description
## Responsibilities
- Own impactful runtime problems end-to-end — from architecture and design to production launch and ongoing reliability.
- Build and evolve core services for session lifecycle, streaming responses (e.g., gRPC/WebSockets), structured tool execution, memory/state, and policy/guardrails.
- Design for performance, correctness, and cost: reduce p50/p95 latency, improve tail behavior, and optimize token/tool budgets.
- Integrate with leading LLM providers (e.g., OpenAI, Anthropic, Google Gemini) and internal evaluation frameworks to improve quality and predictability.
- Harden the platform with fault isolation, retries, timeouts, circuit-breaking, backpressure, and graceful degradation.
- Instrument deep observability (tracing, metrics, logs) and create playbooks/SLOs for high availability and on-call excellence.
- Collaborate closely with product, quality, and application teams to prioritize the most impactful roadmap investments.

## Requirements
- 3+ years of software engineering experience building production distributed systems or cloud-native applications.
- BS/BA in Computer Science or related field, or equivalent practical experience.
- Strong coding skills in at least one of: **Python**, **Go**, **Java**, or **C++**, with a focus on reliability, performance, and tests.
- Product-minded: prioritize customer impact, clear SLAs/SLOs, and pragmatic iteration.
- Ownership-driven with a positive, proactive attitude; comfortable leading projects or learning from battle-tested engineers.
- Experience operating services on **Kubernetes** and at least one major cloud (e.g., **GCP**, **AWS**, or **Azure**).
- Familiarity with event/streaming systems (e.g., **Pub/Sub**, **Kafka**), caching (e.g., **Redis**), and data stores for low-latency paths.
- Practical understanding of LLM/agents building blocks: tool/function calling, structured outputs, streaming, and model selection/routing.
- Strong observability and debugging skills: tracing (e.g., **OpenTelemetry**), metrics, dashboards, and production forensics.

## Nice-to-Haves
- Background in one or more areas: policy/guardrails, multi-tenant isolation, rate-limiting, concurrency control, cost optimization.

## Compensation & Benefits
- Standard base salary range: **$170,000 - $265,000** annually (determined by location, level, knowledge, skills, experience).
- Eligible for variable compensation, equity, and benefits.
- Comprehensive benefits: Medical, Vision, Dental; generous time-off; 401k; home office stipend; annual education and wellness stipends; company events; daily healthy lunches.
**Apply:** https://hotfix.jobs/jobs/software-engineer-agentic-runtime-at-glean-9042d3f0-d977-4b86-b6c7-ae5ac0df0c52
**Canonical:** https://hotfix.jobs/jobs/software-engineer-agentic-runtime-at-glean-9042d3f0-d977-4b86-b6c7-ae5ac0df0c52