Skip to content

Intermediate AI-Enabled DevOps Engineer

Builds and operates cloud infrastructure and CI/CD pipelines for AI-enabled workloads, focusing on automation, reliability, and containerized deployments in Kubernetes. Requires 2-4+ years DevOps experience, IaC, scripting, and cloud providers like AWS/Azure/GCP.

106k – 118kUnited StatesDevOps / SRERemote2+ YOE

About the role

Responsibilities

Infrastructure & Platform Support

  • Implement and maintain cloud infrastructure using infrastructure-as-code.
  • Support and improve CI/CD pipelines for application and ML workloads.
  • Operate containerized workloads in Kubernetes or managed container platforms.
  • Assist in rolling out platform improvements and infrastructure changes.

Automation & AI Exposure

  • Automate repetitive operational tasks using scripts and tooling.
  • Assist with deploying and monitoring AI/ML workloads.
  • Use AI-assisted tools to improve Log analysis and alert triage, Incident investigation and documentation, Deployment validation and testing.

Reliability & Operations

  • Participate in on-call rotations.
  • Respond to and help resolve environment & CI/CD pipeline incidents.
  • Monitor system health using dashboards, logs, and alerts.
  • Contribute to post-incident reviews and follow-up actions.

Collaboration & Learning

  • Follow and contribute to established DevOps standards and practices.
  • Document procedures, runbooks, and operational knowledge.
  • Strong interest and motivation to grow technical skills.

Requirements

Must-haves

  • 2-4+ years of experience in DevOps, Cloud, SRE, or Infrastructure Engineering.
  • Experience with at least one cloud provider (Azure, AWS, or GCP).
  • Hands-on exposure to:
    • Scripting experience (Bash, Python, or similar)
    • Infrastructure as Code (Terraform, IaC 2.0, or similar)
    • CI/CD pipelines
    • Containers (Docker) and Kubernetes concepts
    • Monitoring, logging, and alerting tools.

Nice-to-haves

  • 2+ years in healthcare technology or other highly regulated SaaS environments (financial services, government).
  • Familiarity with AI-driven monitoring or AIOps tools.
  • Experience using AI-assisted development or operations tools.
  • Understanding of cloud networking and security fundamentals.
  • Experience supporting on-call.

Education

  • Bachelor's degree in Computer Science, Computer Engineering, or related technical field.
  • Advanced certifications (AWS/GCP/Azure professional-level, CKA, CKS, CISSP) are highly valued.
  • Demonstrated commitment to continuous learning.

Skills

KubernetesDockerTerraformCI/CDAWSAzureGCPPythonBashMonitoring Tools

Similar roles

DevOps / SRE jobs

Software Engineer, DevOps / Infrastructure

DevOps Engineer builds and maintains CI/CD pipelines, ML model infrastructure, and automated testing for AI image/video software products. Requires 2+ years experience, C++ build tools expertise, and cloud platforms like AWS/Azure.

110k – 160kDallas, TXDevOps / SREOn-site2+ YOEQtGo

Product Reliability Engineer

Owns end-to-end system reliability, incident response, observability, and proactive stability improvements in a serverless AWS environment. Requires 2+ years software engineering with production-facing experience, strong debugging, and hands-on AWS/Go/TypeScript skills.

100k – 160kNew York, NYDevOps / SREOn-site2+ YOEGoAWS

Software Engineer - Developer Infrastructure

Builds and improves core libraries, frameworks, and developer tools like Bazel and Buildkite CI/CD to boost engineering productivity. Requires 2+ years experience, Bachelor's in CS, and expertise in Go/C++/Python/TypeScript.

120k – 300kSunnyvale, CADevOps / SREOn-site2+ YOEGoC++

Site Reliability Engineer

Site Reliability Engineer improves, manages, and monitors production-critical infrastructure and data pipelines in a finance AI/ML firm. Collaborates on fault-tolerance, deployments, automation, and on-call incident response using Python, Linux, and cloud tools. Requires 2+ years experience and quantitative degree.

120k – 160kNew York, NY +1DevOps / SRERemote2+ YOERGo

Capacity Ops Associate

Manages GPU fleet operations, including node maintenance, capacity fulfillment, and technical orchestration between SRE/infra teams and customers. Requires 2+ years experience, Kubernetes familiarity, and strong communication skills.

120k – 160kSan Francisco, CA +1DevOps / SREHybrid2+ YOEGPUSRE