Skip to content

Senior DevOps Engineer (Infrastructure & MLOps)

Designs and manages scalable AWS infrastructure, builds CI/CD pipelines for web and ML applications, implements MLOps practices, and ensures reliability through monitoring. Requires 5+ years DevOps experience, AWS expertise, containerization, and Python scripting.

180k – 225kUnited StatesDevOps / SRERemote5+ YOE

About the role

Key Responsibilities

  • Design, implement, and manage highly available infrastructure for our cloud-based platforms (AWS).
  • Work with our internal engineering teams to architect and support AI/ML infrastructure, specifically managing AWS Lambda infrastructure, and some legacy SageMaker environments for model training, hosting, and inference.
  • Create and automate robust deployment pipelines using CI/CD tools (GitLab / GitHub Actions) for both web applications and machine learning models.
  • Build, maintain, and scale containerized applications with Docker and ECS/Fargate.
  • Implement MLOps best practices to streamline the transition of models from development to production.
  • Ensure system scalability and reliability through proactive monitoring, logging, and automated alerting.
  • Collaborate with both Product Engineers and Data Scientists to optimize performance, security, and infrastructure costs.
  • Manage and evolve our Infrastructure as Code (IaC) footprint.

Qualifications

  • 5+ years of experience in a DevOps or infrastructure role.
  • Expert knowledge of cloud platforms such as AWS, GCP and Azure.
  • Strong experience with containerization technologies (Docker, ECS / Kubernetes).
  • Proven track record of designing and managing complex CI/CD pipelines.
  • Experience with MLOps workflows (model versioning, retraining pipelines, or feature stores).
  • Hands-on experience with monitoring and logging tools (Datadog, Prometheus, Grafana, MLflow).
  • Expertise in scripting languages (Python is a must, along with Bash, Go, etc.).
  • Proficiency with infrastructure automation tools (Terraform, Ansible, or CloudFormation).
  • Excellent communication skills and the ability to bridge the gap between traditional DevOps and Data Science teams.

Preferred Skills

  • Specific experience in AWS Lambda, SageMaker and other related AI/ML services.
  • Experience with database management (SQL/NoSQL) and data pipeline orchestration, including Google BigQuery.
  • Knowledge of security best practices specifically regarding data privacy in AI and healthcare (HIPAA compliance).
  • Previous experience in fast-paced, agile startup environments.

Perks

  • Competitive salaries
  • Remote/hybrid environment
  • Potential equity compensation for outstanding performance
  • Flexible PTO
  • Company-wide sponsored lunches
  • Company paid disability and life insurance benefits
  • Company paid family and medical leave
  • Medical, dental, and vision insurance benefits
  • Discounted pet insurance
  • FSA/DCA and commuter benefits
  • 401k
  • Complimentary subscription to digital fitness classes and wellness content
  • Recovery suite at HQ – includes a cold plunge, sauna, and shower

Skills

AWSDockerKubernetesTerraformCI/CDMLOpsPythonGitLabGitHub ActionsECSFargateSageMakerAWS LambdaDatadogPrometheus

Similar roles

DevOps / SRE jobs

Senior Software Engineer, Infrastructure

Senior Infrastructure Engineer responsible for building and operating platform primitives including Kubernetes, CI/CD, observability, and developer tooling at a high-growth AI and data platform company.

180k – 250kBoston, MADevOps / SREHybrid5+ YOEGoGCP

Senior Infrastructure Engineer

As a Senior Infrastructure Engineer, you will own and evolve the foundational systems powering Casca's AI-driven lending platform. You will build and maintain infrastructure, CI/CD pipelines, cloud infrastructure, deployment automation, and observability systems, ensuring the platform is secure, compliant, and highly available.

180k – 215kSan Francisco, CADevOps / SREOn-site5+ YOEGoAWS

Senior Release Engineer

As a Senior Release Engineer, you will design, maintain, and improve deployment processes, build and scale CI/CD systems, and manage production environments. You will partner with engineering teams to eliminate bottlenecks and enhance platform resilience.

180k – 200kUnited StatesDevOps / SRERemote4+ YOEHelmLinux

Sr. Software Engineer, DevOps

Builds and scales reliable infrastructure for SaaS applications using Kubernetes, Terraform, and GitHub CI/CD. Focuses on observability with Grafana/Prometheus, automation to reduce toil, production troubleshooting, and cross-team collaboration. Requires 5+ years Python experience.

180k – 220kSouth San Francisco, CADevOps / SREHybrid5+ YOEAWSGit

Senior DevEx Engineer

Steward Replit's TypeScript monorepo, Go services, and developer tooling to accelerate engineering velocity and reduce friction. Partner with AI team to enhance agent-generated code, requiring senior-level expertise in build systems and large-scale codebases.

180k – 250kFoster City, CADevOps / SREHybridGoNix