# Site Reliability Engineer
**Company:** [Mercor](https://hotfix.jobs/companies/mercor)
**Location:** San Francisco, CA
**Salary:** $130K-$500K
**Experience:** 5+ years
**Skills:** Kubernetes, AWS, Terraform, Sre Practices, Slis/Slos, Error Budgets, Incident Response, Postmortems, Distributed Systems, Iac
**Posted:** 2025-12-27
> Owns production reliability for critical systems, builds SRE function from scratch, introduces modern practices like SLIs/SLOs and error budgets. Requires 5+ years SRE experience with large-scale distributed systems.
## Job Description
## What You’ll Do
- Own reliability and production safety for core shared services and customer-facing systems.
- Partner directly with infrastructure leadership to define SRE priorities, reliability standards, and production safety roadmap.
- Repair and improve how our production systems are structured so they are stable, resource-efficient, isolated, and well-observed.
- Introduce and champion modern SRE practices (e.g., incident response, postmortems, **SLIs/SLOs**) across engineering teams.
- Collaborate with leverage engineering and applied AI teams to ensure sustainable growth.
- Represent SRE best practices internally and help teams onboard onto production in a way that is safe, scalable, and consistent with SRE principles.

## What We’re Looking For
- Experience doing true SRE work (not just operations) across multiple roles or companies.
- Deep familiarity with SRE practices as popularized by Google (e.g., **error budgets**, reliability vs. risk trade-offs, large-scale distributed systems).
- **5+ years of SRE experience**; 15+ years of overall experience is ideal for this first SRE hire.
- Proven success operating systems at scale, with a strong understanding of the challenges of large, distributed production environments.
- Strong collaboration skills; able to work efficiently with cross-functional engineering teams.
- Ability to drive cultural change around reliability while remaining hands-on in building and fixing systems.
- Comfort working in high-intensity, high-availability environments where uptime and production quality are critical.

## Nice to Haves
- Experience as a founding SRE or early SRE hire, standing up SRE practices and orgs from scratch.
- Hands-on experience in the **AWS** ecosystem, **Kubernetes**, and modern IaC tooling (**Terraform**, **Spacelift**, etc.).
**Apply:** https://hotfix.jobs/jobs/site-reliability-engineer-at-mercor-7c01c731-16e7-4224-b558-eb7a8c2b6a94
**Canonical:** https://hotfix.jobs/jobs/site-reliability-engineer-at-mercor-7c01c731-16e7-4224-b558-eb7a8c2b6a94