Skip to content

Infrastructure Software Engineer, Enterprise GenAI

Build and scale enterprise GenAI infrastructure across multi-cloud providers (AWS, Azure, GCP), implementing integrations and architecting systems for regulated industries. Requires 4+ years experience, proficiency in Python/JS/SQL, Kubernetes, and AI technologies like LLMs.

216k – 270kSan Francisco, CANew York, NYSeattle, WADevOps / SREOnsite4+ YOE

About the role

What You’ll Do

  • Architect multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers
  • Implement custom integrations between Scale AI's platform and customer data environments (cloud platforms, data warehouses, internal APIs)
  • Collaborate with platform, product teams and our customers directly to develop and implement innovative infrastructure that scales to meet evolving needs
  • Deliver experiments at a high velocity and level of quality to engage our customers
  • Work across the entire product lifecycle from conceptualization through production
  • Be able, and willing, to multi-task and learn new technologies quickly

What We’re Looking For

  • 4+ years of full-time engineering experience, post-graduation
  • Experience scaling products at hyper growth startups
  • Experience tinkering with or productizing LLMs, vector databases, and the other latest AI technologies
  • Proficient in Python or Javascript/Typescript, and SQL
  • Experience with Kubernetes
  • Experience with major cloud providers (AWS, Azure, GCP)
  • Excellent communication skills with the ability to explain technical concepts to both technical and non-technical audiences

Skills

PythonJavaScriptTypeScriptSQLKubernetesAWSAzureGCPLLMsVector Databases

Similar roles

DevOps / SRE jobs

Software Engineer, Platform

Design and build foundational data platforms, cloud infrastructure, and orchestration systems supporting AI/ML products. Requires 3+ years backend experience with Kubernetes, Terraform, Docker, AWS, Temporal, MongoDB, and Postgres.

216k – 270kSan Francisco, CA +1DevOps / SREOn-site3+ YOEAWSDocker

AI Infrastructure Engineer - Training Platform

Builds and scales high-performance training platforms for large-scale GPU clusters, architecting orchestration, scheduling, and observability for ML workloads. Requires 5+ years in infrastructure engineering with ML focus, Kubernetes expertise, and systems programming.

216k – 270kSan Francisco, CA +2DevOps / SREOn-site5+ YOEGoC++

Software Engineer, Platform

Owns and scales platform infrastructure including edge/cloud services on Cloudflare, GCP, Vercel and data layers like Spanner, ClickHouse, Postgres to serve millions of LLM requests daily. Requires 5+ years in production infrastructure with cloud platforms, databases, and full-stack TypeScript expertise.

215k – 285kUnited StatesDevOps / SRERemote5+ YOEGCPAWS

HPC/ GPU Cluster Architect

Designs, architects, and scales production GPU/HPC clusters globally. Debugs hardware/software issues, automates operations, and mentors juniors. Requires 5+ years experience and hybrid SF presence.

220k – 300kSan Francisco, CADevOps / SREHybrid5+ YOEGPUHpc

Software Engineer, Platform

Builds and owns platform infrastructure from scratch including CI/CD, Terraform, AWS services, monitoring, SSO, and APIs for a scaling sales coaching startup. Requires deep experience in cloud infra, IaC, containers, databases, and observability.

220k – 300kNew York, NYDevOps / SREOn-siteAWSECS