# Software Engineer, Infrastructure (All Levels)
**Company:** [Rad AI](https://hotfix.jobs/companies/rad-ai)
**Location:** San Francisco, CA
**Salary:** $160K-$300K
**Experience:** 4+ years
**Skills:** AWS, Kubernetes, Docker, Terraform, Python, Linux, GCP, CI/CD, HIPAA, Observability
**Posted:** 2026-02-06
> Designs, builds, and operates scalable cloud infrastructure on AWS with Kubernetes and serverless tech to support AI healthcare products. Requires 4+ years experience in cloud-native platforms, IaC, automation, and reliability practices for regulated environments.
## Job Description
## What You’ll Be Doing
- Influence the technical direction for infrastructure and platform capabilities that support our rapidly growing AI product suite.
- Architect and evolve our cloud infrastructure (primarily on AWS) across container orchestration (**Kubernetes**, **Elastic Container Service**), serverless (e.g., **Lambda**), virtual machines (e.g., **EC2**), and data stores to support current and future products.
- Work closely with Platform leadership, product engineering, data, and ML teams to design systems that are robust, observable, and compliant in a healthcare environment.
- Define and drive infrastructure strategy for the Platform org—partnering with engineering leadership to align roadmaps, set standards, and sequence work for maximum business impact.
- Secure networking, identity, and access patterns across environments.
- Improve reliability and operational excellence by defining SLOs, SLIs, and error budgets for core platform services.
- Leading and participating in blameless post-incident reviews and translating learnings into systemic improvements.
- Own observability and monitoring strategy across logging, metrics, and tracing, ensuring we can detect, debug, and prevent issues efficiently.
- Mentor and level up engineers across Platform and product teams—reviewing design docs, guiding architecture decisions, and modeling high standards for reliability, security, and maintainability.
- Partner with security and compliance stakeholders to ensure our infrastructure and operational practices meet **HIPAA** and other healthcare requirements.
- Advocate for and implement developer experience improvements, such as better **CI/CD** workflows, faster feedback loops, and tooling that reduces cognitive load for product teams.

## Who We’re Looking For
- Bring 4+ years of hands-on infrastructure / platform development experience (or equivalent practical experience) in modern, cloud-native environments, with a track record of owning critical systems in production.
- Have deep expertise with **AWS** (preferred) and/or **GCP**, including core networking, compute, storage, and managed services.
- Are highly proficient in at least one programming/scripting language used for infrastructure work (**Python** preferred).
- Extensive experience building tooling and automation for other engineers.
- Have strong experience with **Kubernetes**, containers (**Docker**), and container orchestration, and understand how to operate these systems reliably at scale.
- Are comfortable with **Infrastructure as Code** (**Terraform** preferred, **Pulumi**, or similar) and Git-based workflows.
- Possess solid **Linux** fundamentals and are comfortable debugging issues at the OS, networking, and application layers.
- Have demonstrable experience leading complex, cross-team initiatives from design through rollout—communicating tradeoffs, aligning stakeholders, de-risking launches, and measuring impact.
- Communicate clearly and empathetically with both technical and non-technical partners, and enjoy mentoring engineers at multiple levels.
- Take a data-informed, pragmatic approach to decision-making—balancing ideal architecture with business needs, delivery timelines, and team capacity.

## Nice to Haves
- Experience in regulated environments (e.g., **HIPAA**) or prior work in healthcare or health tech.
- Background in platform or security engineering, especially around access control, encryption, auditability, and compliance.
- Experience working closely with ML / data teams or with ML platforms (e.g., **Airflow**, **Ray**, ML pipelines, model serving stacks).
- Familiarity with observability stacks (**CloudWatch**, **New Relic**, **Grafana**, **OpenTelemetry**, etc.).
- Experience designing or operating internal developer platforms, SDKs, or reusable frameworks that standardize how services are built and deployed.
- Prior experience at a fast-growing startup where you’ve helped scale infrastructure, processes, and teams.
**Apply:** https://hotfix.jobs/jobs/software-engineer-infrastructure-all-levels-at-rad-ai-904df12a-c71c-4d3a-baf8-8159b965eb0c
**Canonical:** https://hotfix.jobs/jobs/software-engineer-infrastructure-all-levels-at-rad-ai-904df12a-c71c-4d3a-baf8-8159b965eb0c