Site Reliability Engineer
Builds and maintains cloud infrastructure reliability for large-scale ML on biosignal data, managing Kubernetes clusters, CI/CD pipelines, observability, and security. Requires 5+ years SRE/DevOps experience, Kubernetes expertise, IaC with Terraform, and cloud proficiency.
Responsibilities
- Design and implement infrastructure as code solutions that improve reliability, security, and maintainability of our cloud infrastructure
- Lead and execute major infrastructure initiatives including cluster upgrades, security improvements, and architectural changes
- Develop and maintain CI/CD pipelines that enable teams to deploy safely and efficiently
- Improve observability across our systems through enhanced monitoring, logging, and alerting
- Participate in an on-call rotation and lead incident response efforts when issues arise
- Collaborate with development teams to improve application reliability and performance
- Maintain and enhance our security posture through infrastructure hardening and automation
- Create and maintain documentation for infrastructure, deployment processes, and incident response procedures
Requirements
- Strong experience with Kubernetes administration, including cluster management, security, and troubleshooting
- Proven track record implementing infrastructure as code using Terraform or similar tools
- Experience building and maintaining CI/CD pipelines, particularly with GitHub Actions, Azure DevOps, or ArgoCD
- Solid understanding of container technologies and build processes, especially Docker
- Strong cloud provider (e.g. AWS) knowledge including networking, security, and infrastructure services; experience with Azure is a plus
- Experience with incident response and on-call responsibilities in a production environment
- Deep experience with Linux systems administration and debugging; familiarity with Windows Server environments is a plus
- Proficiency in at least one programming language (Python, Go, TypeScript etc.)
- Understanding of security and networking concepts including OAuth2/OIDC, DNS, TLS, TCP/UDP, etc
Approximate experience: Bachelor's degree + 5-8 years of experience in SRE, DevOps, or similar
Compensation
Salary range: $150,000 – $170,000 (adjusted based on experience, skills, and location). Includes equity, PTO and other benefits.
Senior Network Engineer
Design, deploy, and operate enterprise network infrastructure for corporate facilities and hybrid cloud environments with zero-trust architecture and compliance requirements. Requires 5+ years enterprise networking experience and ability to obtain TS/SCI clearance.
Site Reliability Engineer
Senior or Staff Site Reliability Engineer focused on continuous delivery infrastructure using Argo Workflows, ArgoCD, and Kubernetes. Owns deployment tooling, onboarding flows, and participates in 24/7 on-call. Requires 6+ years building and operating distributed systems.