Infrastructure Engineer
163k – 204kUnited StatesRemote
Summary
Infrastructure Engineer builds and maintains internal engineering services, improves observability, CI/CD pipelines, and cloud infrastructure using tools like Kubernetes and AWS. Requires experience with distributed systems, infrastructure as code, and operating managed services in a remote environment.
About the role
Key Responsibilities
- Work as part of a team of engineers to design, build, test, and document core software components.
- Exhibit ownership over the running services that comprise Tailscale’s product by building for observability, participating in incident response, and fielding customer support escalations.
- Analyze and improve efficiency, scalability, and stability of various system resources.
Example Deliverables
- Improve observability through metrics, alerting, logging and telemetry integration.
- Identify and build improvements for Continuous Deployment.
- Utilize infrastructure as code to make changes in a cloud environment.
- Collaborate with other engineering teams to build cross functional infrastructure improvements.
- Automate upgrades for managed services and VMs.
What We Are Looking For
- Experience with CI/CD, secrets management, infrastructure as code, and observability.
- Experience with distributed systems.
- Experience with operating managed services in a cloud environment (preferably AWS).
- Experience with operating Kubernetes in production is a strong plus.
- Familiarity with networks (IP addressing, routing, etc.).
- Most of the non-front-end portions of the system are developed in the Go programming language. Experience with Go is a plus.
- Ability to give and process constructive feedback, as well as work independently.
- Flexibility to adjust to the dynamic nature of a startup.
- Excellent written and verbal communication skills.
Skills
GoKubernetesAWSCI/CDInfrastructure as CodeObservabilityDistributed SystemsSecrets ManagementContinuous Deployment
Similar roles at this salary range
All DevOps / SRE jobs →Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE