Skip to content

Software Engineer, Infrastructure (All Levels)

Designs, builds, and operates scalable cloud infrastructure on AWS with Kubernetes and serverless tech to support AI healthcare products. Requires 4+ years experience in cloud-native platforms, IaC, automation, and reliability practices for regulated environments.

160k – 300kSan Francisco, CADevOps / SREOnsite4+ YOE

About the role

What You’ll Be Doing

  • Influence the technical direction for infrastructure and platform capabilities that support our rapidly growing AI product suite.
  • Architect and evolve our cloud infrastructure (primarily on AWS) across container orchestration (Kubernetes, Elastic Container Service), serverless (e.g., Lambda), virtual machines (e.g., EC2), and data stores to support current and future products.
  • Work closely with Platform leadership, product engineering, data, and ML teams to design systems that are robust, observable, and compliant in a healthcare environment.
  • Define and drive infrastructure strategy for the Platform org—partnering with engineering leadership to align roadmaps, set standards, and sequence work for maximum business impact.
  • Secure networking, identity, and access patterns across environments.
  • Improve reliability and operational excellence by defining SLOs, SLIs, and error budgets for core platform services.
  • Leading and participating in blameless post-incident reviews and translating learnings into systemic improvements.
  • Own observability and monitoring strategy across logging, metrics, and tracing, ensuring we can detect, debug, and prevent issues efficiently.
  • Mentor and level up engineers across Platform and product teams—reviewing design docs, guiding architecture decisions, and modeling high standards for reliability, security, and maintainability.
  • Partner with security and compliance stakeholders to ensure our infrastructure and operational practices meet HIPAA and other healthcare requirements.
  • Advocate for and implement developer experience improvements, such as better CI/CD workflows, faster feedback loops, and tooling that reduces cognitive load for product teams.

Who We’re Looking For

  • Bring 4+ years of hands-on infrastructure / platform development experience (or equivalent practical experience) in modern, cloud-native environments, with a track record of owning critical systems in production.
  • Have deep expertise with AWS (preferred) and/or GCP, including core networking, compute, storage, and managed services.
  • Are highly proficient in at least one programming/scripting language used for infrastructure work (Python preferred).
  • Extensive experience building tooling and automation for other engineers.
  • Have strong experience with Kubernetes, containers (Docker), and container orchestration, and understand how to operate these systems reliably at scale.
  • Are comfortable with Infrastructure as Code (Terraform preferred, Pulumi, or similar) and Git-based workflows.
  • Possess solid Linux fundamentals and are comfortable debugging issues at the OS, networking, and application layers.
  • Have demonstrable experience leading complex, cross-team initiatives from design through rollout—communicating tradeoffs, aligning stakeholders, de-risking launches, and measuring impact.
  • Communicate clearly and empathetically with both technical and non-technical partners, and enjoy mentoring engineers at multiple levels.
  • Take a data-informed, pragmatic approach to decision-making—balancing ideal architecture with business needs, delivery timelines, and team capacity.

Nice to Haves

  • Experience in regulated environments (e.g., HIPAA) or prior work in healthcare or health tech.
  • Background in platform or security engineering, especially around access control, encryption, auditability, and compliance.
  • Experience working closely with ML / data teams or with ML platforms (e.g., Airflow, Ray, ML pipelines, model serving stacks).
  • Familiarity with observability stacks (CloudWatch, New Relic, Grafana, OpenTelemetry, etc.).
  • Experience designing or operating internal developer platforms, SDKs, or reusable frameworks that standardize how services are built and deployed.
  • Prior experience at a fast-growing startup where you’ve helped scale infrastructure, processes, and teams.

Skills

AWSKubernetesDockerTerraformPythonLinuxGCPCI/CDHIPAAObservability

Similar roles

DevOps / SRE jobs

Software Engineer, Security

Build and own observability, SRE practices, and secure AWS cloud infrastructure for an early-stage audio AI startup. Requires 3+ years in SRE/infra and hands-on experience with Terraform, Kubernetes, and observability tools.

160k – 200kSan Francisco, CADevOps / SREOn-site3+ YOEAWSCI/CD

Network Engineer - Backbone

Northwood is seeking a Senior Network Engineer to design and build global backbone network infrastructure for space communications. This role involves deploying WAN infrastructure, advanced routing protocols, and optical transport systems to ensure high uptime for mission-critical data.

160k – 220kLos Angeles, CADevOps / SREOn-site5+ YOEBGPOspf

Site Reliability Engineer (SRE)

SRE responsible for designing, automating, and operating reliable cloud infrastructure on AWS GovCloud for U.S. Government customers, with focus on Kubernetes, CI/CD automation, and compliance requirements.

160k – 200kArvada, CODevOps / SREHybrid3+ YOEGitCI/CD

Software Engineer, Desktop Automation

Build and own desktop automation execution platform integrating AI agents with Windows sessions via remote protocols like VNC/RDP. Requires deep systems knowledge in OS APIs, accessibility frameworks, and automation infrastructure for reliable enterprise workflows.

160k – 300kNew York, NYDevOps / SRERemoteRdpUia

Software Engineer - Platform/Infrastructure

Builds and scales backend infrastructure and data processing systems for a human risk security platform, ensuring reliability and performance. Requires 3-7+ years in backend/platform engineering with distributed systems, cloud platforms, and data processing tools.

160k – 225kSan Francisco, CADevOps / SREOn-site3+ YOEAWSGCP