Skip to content

DevOps Engineer

New York, NYDevOps / SREOnsite3+ YOE
Summary

Owns and manages Kubernetes clusters, infrastructure as code, CI/CD pipelines, real-time data pipelines, monitoring, and production debugging for large-scale AI infrastructure. Requires 3+ years DevOps experience with distributed systems and cloud environments.

About the role

Responsibilities

  • Managing Kubernetes clusters across multiple environments and regions
  • Owning infrastructure as code for all resources
  • Maintaining and improving CI/CD pipelines and GitOps-based deployments
  • Maintaining and optimizing real-time data pipelines that process billions of events per day across distributed queues and stream processors
  • Building out monitoring, alerting, and observability
  • Debugging production issues across services
  • Managing cloud costs and capacity planning
  • Working closely with a small engineering team — owning infra end-to-end

Requirements

  • ~3+ years in a DevOps or platform engineering role, working in production environments
  • Proven experience designing and operating large-scale, distributed systems, with a solid understanding of API design, reliability, and performance at scale
  • Strong Kubernetes experience in a managed cloud environment
  • Proficiency with infrastructure as code (Terraform or similar)
  • Experience with GitOps-based deployment workflows
  • Built or maintained observability stacks (logging, metrics, alerting)
  • Experience handling production incidents calmly and methodically

Nice to Have

  • Multi-region deployments
  • Search infrastructure
  • Data pipeline experience (streaming, warehousing)
  • Proxy/networking infrastructure at scale
Skills
KubernetesTerraformGitOpsCI/CDObservabilityData PipelinesMonitoringAlertingInfrastructure as Code