Software Engineer, Infrastructure
Builds and scales cloud infrastructure for Render's developer platform, focusing on container orchestration, networking, storage, and AI workloads. Requires 5+ years experience with Kubernetes, IaC tools like Terraform/Pulumi/Ansible, and production systems at scale.
What You'll Do
- Own Render's core infrastructure across multiple data centers and regions.
- Help offer unique capabilities to Render customers through infrastructure innovation.
- Plan and architect for rapidly increasing scale.
- Debug issues at all levels in our infrastructure stack.
- Improve the performance and reliability of our infrastructure through increased observability, load testing, and chaos engineering.
- Collaborate with other engineers to help keep our platform stable, predictable, and secure.
- Participate in our on-call rotation, with the rest of the engineering team.
What We're Looking For
- At least 5 years of experience building and scaling cloud infrastructure.
- Experience developing, maintaining, and debugging production systems at scale.
- Experience building, operating and scaling Kubernetes clusters or similar resource/container orchestration.
- Experience with infrastructure-as-code tools like Terraform, Pulumi, and Ansible.
Nice-to-haves
- Experience with Linux kernel and/or container optimization
- Familiarity with observability tools like Datadog, Grafana, and OpenTelemetry.
- Experience hosting PostgreSQL (or similar data stores) at scale.
- Security hardening skills, especially in the context of untrusted workloads.
Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
Senior Platform Reliability Engineer
Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.