DevOps Engineer
New York, NYDevOps / SREOnsite3+ YOE
Summary
Owns and manages Kubernetes clusters, infrastructure as code, CI/CD pipelines, real-time data pipelines, monitoring, and production debugging for large-scale AI infrastructure. Requires 3+ years DevOps experience with distributed systems and cloud environments.
About the role
Responsibilities
- Managing Kubernetes clusters across multiple environments and regions
- Owning infrastructure as code for all resources
- Maintaining and improving CI/CD pipelines and GitOps-based deployments
- Maintaining and optimizing real-time data pipelines that process billions of events per day across distributed queues and stream processors
- Building out monitoring, alerting, and observability
- Debugging production issues across services
- Managing cloud costs and capacity planning
- Working closely with a small engineering team — owning infra end-to-end
Requirements
- ~3+ years in a DevOps or platform engineering role, working in production environments
- Proven experience designing and operating large-scale, distributed systems, with a solid understanding of API design, reliability, and performance at scale
- Strong Kubernetes experience in a managed cloud environment
- Proficiency with infrastructure as code (Terraform or similar)
- Experience with GitOps-based deployment workflows
- Built or maintained observability stacks (logging, metrics, alerting)
- Experience handling production incidents calmly and methodically
Nice to Have
- Multi-region deployments
- Search infrastructure
- Data pipeline experience (streaming, warehousing)
- Proxy/networking infrastructure at scale
Skills
KubernetesTerraformGitOpsCI/CDObservabilityData PipelinesMonitoringAlertingInfrastructure as Code