# Product Reliability Engineer - Defense
**Company:** [Palantir](https://hotfix.jobs/companies/palantir)
**Location:** Washington, DC
**Skills:** Java, Go, Prometheus, Distributed Systems, Observability, Cloud Infrastructure, Load Balancing, Monitoring, Storage Systems, Data Processing
**Posted:** 2026-01-27
> Owns end-to-end service reliability at Palantir, troubleshooting outages, improving observability, and enhancing codebases for stability. Requires engineering background, Java experience, and US security clearance eligibility.
## Job Description
## Core Responsibilities
- Continuously invest in documentation, metrics, monitors and other troubleshooting tools
- Participate in on-call rotations during business hours and occasional weekends
- Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues
- Improve observability by refactoring codepaths and introducing telemetry
- Identify and implement data-driven opportunities for improved service resilience
- Develop strategic opinions on stability investments and inform the vision for long-term product stability

## What We Require
- Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field
- Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment
- Experience producing code in backend languages such as Java, as part of a past role or personal projects
- Familiarity with storage and data processing systems and cloud infrastructure
- Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback
- Eligibility and willingness to obtain a US Security clearance

## What We Value
- Comfortable with and curious about large scale production systems and technologies (e.g., load balancing, monitoring, distributed systems, configuration management)
- Confidence in troubleshooting complex issues independently using observability tools and stack traces
- Familiarity with monitoring tools such as Prometheus and health checks
- Experience coding with Java, Go and/or web technologies (e.g. HTML, CSS, JavaScript, Python/Ruby, Django/Flask/Ruby on Rails)
- Track record of identifying bugs in codebases and contributing fixes leading to long term service stability
- Demonstrated ability making data-driven decisions and engaging with stakeholders on strategy
**Apply:** https://hotfix.jobs/jobs/product-reliability-engineer-defense-at-palantir-01060f88-5689-41ad-af16-151c22e8d910
**Canonical:** https://hotfix.jobs/jobs/product-reliability-engineer-defense-at-palantir-01060f88-5689-41ad-af16-151c22e8d910