Intermediate AI-Enabled DevOps Engineer

Builds and operates cloud infrastructure and CI/CD pipelines for AI-enabled workloads, focusing on automation, reliability, and containerized deployments in Kubernetes. Requires 2-4+ years DevOps experience, IaC, scripting, and cloud providers like AWS/Azure/GCP.

106k – 118kUnited StatesDevOps / SRERemote2+ YOE

Apply

About the role

Responsibilities

Infrastructure & Platform Support

Implement and maintain cloud infrastructure using infrastructure-as-code.
Support and improve CI/CD pipelines for application and ML workloads.
Operate containerized workloads in Kubernetes or managed container platforms.
Assist in rolling out platform improvements and infrastructure changes.

Automation & AI Exposure

Automate repetitive operational tasks using scripts and tooling.
Assist with deploying and monitoring AI/ML workloads.
Use AI-assisted tools to improve Log analysis and alert triage, Incident investigation and documentation, Deployment validation and testing.

Reliability & Operations

Participate in on-call rotations.
Respond to and help resolve environment & CI/CD pipeline incidents.
Monitor system health using dashboards, logs, and alerts.
Contribute to post-incident reviews and follow-up actions.

Collaboration & Learning

Follow and contribute to established DevOps standards and practices.
Document procedures, runbooks, and operational knowledge.
Strong interest and motivation to grow technical skills.

Requirements

Must-haves

2-4+ years of experience in DevOps, Cloud, SRE, or Infrastructure Engineering.
Experience with at least one cloud provider (Azure, AWS, or GCP).
Hands-on exposure to:
- Scripting experience (Bash, Python, or similar)
- Infrastructure as Code (Terraform, IaC 2.0, or similar)
- CI/CD pipelines
- Containers (Docker) and Kubernetes concepts
- Monitoring, logging, and alerting tools.

Nice-to-haves

2+ years in healthcare technology or other highly regulated SaaS environments (financial services, government).
Familiarity with AI-driven monitoring or AIOps tools.
Experience using AI-assisted development or operations tools.
Understanding of cloud networking and security fundamentals.
Experience supporting on-call.

Education

Bachelor's degree in Computer Science, Computer Engineering, or related technical field.
Advanced certifications (AWS/GCP/Azure professional-level, CKA, CKS, CISSP) are highly valued.
Demonstrated commitment to continuous learning.

Skills

KubernetesDockerTerraformCI/CDAWSAzureGCPPythonBashMonitoring Tools

Similar roles

DevOps / SRE jobs

Topaz Labs

Software Engineer, DevOps / Infrastructure

DevOps Engineer builds and maintains CI/CD pipelines, ML model infrastructure, and automated testing for AI image/video software products. Requires 2+ years experience, C++ build tools expertise, and cloud platforms like AWS/Azure.

110k – 160kDallas, TXDevOps / SREOn-site2+ YOEQtGo

PointOne

Product Reliability Engineer

Owns end-to-end system reliability, incident response, observability, and proactive stability improvements in a serverless AWS environment. Requires 2+ years software engineering with production-facing experience, strong debugging, and hands-on AWS/Go/TypeScript skills.

100k – 160kNew York, NYDevOps / SREOn-site2+ YOEGoAWS

Applied Intuition

Software Engineer - Developer Infrastructure

Builds and improves core libraries, frameworks, and developer tools like Bazel and Buildkite CI/CD to boost engineering productivity. Requires 2+ years experience, Bachelor's in CS, and expertise in Go/C++/Python/TypeScript.

120k – 300kSunnyvale, CADevOps / SREOn-site2+ YOEGoC++

The Voleon Group

Site Reliability Engineer

Site Reliability Engineer improves, manages, and monitors production-critical infrastructure and data pipelines in a finance AI/ML firm. Collaborates on fault-tolerance, deployments, automation, and on-call incident response using Python, Linux, and cloud tools. Requires 2+ years experience and quantitative degree.

120k – 160kNew York, NY +1DevOps / SRERemote2+ YOERGo

Baseten

Capacity Ops Associate

Manages GPU fleet operations, including node maintenance, capacity fulfillment, and technical orchestration between SRE/infra teams and customers. Requires 2+ years experience, Kubernetes familiarity, and strong communication skills.

120k – 160kSan Francisco, CA +1DevOps / SREHybrid2+ YOEGPUSRE