Software Engineer, Infrastructure
180k – 300kRedwood City, CAOnsite
Summary
Builds scalable data and ML infrastructure supporting multi-cloud and deployment models. Partners with founders and engineers on core platforms, tooling, and research features for reliable production systems.
About the role
What You'll Work On
- Design and build the development and production platforms that power our products, enabling reliability and security at scale
- Architect, build, and deploy our core infrastructure while supporting multiple cloud providers and various deployment models
- Accelerate company productivity by empowering your fellow engineers & teammates with excellent tooling and systems, providing a best-in-case experience
- Partner with researchers and engineers to bring new features and research capabilities to our customers
About You
- Have meaningful experience in spearheading and constructing large-scale infrastructure
- Proficiency in bash, Kubernetes, Python, and/or Terraform or similar technologies
- Have experience working with AWS, other cloud platforms such as Azure or GCP and/or on-prem environments
- Have expertise in debugging problems across the stack, such as networking issues, performance problems, hardware issues or memory leaks
- Take pride in building and operating scalable, reliable, secure systems
- Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed
- Own problems end-to-end and are willing to pick up whatever knowledge you're missing to get the job done
We would love it if you had
- Built out data infrastructure from, or nearly from, scratch at a fast-growing startup
- Experience building ML/DL infrastructure and/or data infrastructure that feeds into training large ML models
Compensation
- Base salary: $180,000 to $300,000
- Significant equity
- 100% covered health benefits (medical, vision, and dental)
- 401(k) with 4% company match
- Unlimited PTO
- Annual $2,000 wellness stipend
- Annual $1,000 learning stipend
- Daily lunches and snacks
- Relocation assistance
Skills
KubernetesTerraformPythonbashAWSAzureGCPML infrastructuredata infrastructuredebugging
Similar roles at this salary range
All DevOps / SRE jobs →Staff Site Reliability Engineer - Observability
Staff SRE focused on building and scaling a comprehensive observability platform on GCP using Terraform, Splunk, and Grafana. Requires 5+ years GCP observability experience and strong coding skills in Python or Go.
194k – 267kBellevue, WA +4DevOps / SREHybrid5+ YOEGoGKE
Senior Platform Reliability Engineer
Senior Platform Reliability Engineer establishing reliability standards, observability, and incident response practices across engineering teams. Requires 6+ years operating production systems at scale with AWS, Kubernetes, Terraform, and modern observability tooling.
182k – 250kSan Francisco, CA +2DevOps / SREHybrid6+ YOEAWSEKS