Platform Ops Lead

135k – 165kBethesda, MDRemoteFeb 20

Summary

Leads platform operations team supporting developers on GitLab and Kubernetes-based DevOps platform. Resolves deployment issues, manages on-call support, trains team members, and ensures SLOs in microservices environment. Requires BS in STEM, Linux skills, and scripting experience.

About the role

Duties and Responsibilities

Identify and resolve operational problems in a micro-service environment
Work with developers to resolve deployment and runtime problems
Perform analysis and debugging work across multiple technologies
Prioritize issues to keep applications within error budgets and meeting their SLOs
Provide technical solutions to a wide range of problems and user requests
Document processes, procedures and SOPs by soliciting feedback and suggestions from team members
Compile postmortems and action items to minimize future outages
Interview other people for team member roles, and decide which ones to recommend for hire
Train new team members, and assist them with issues
Provide on-call support to NCBI's internal developers and other staff

Requirements

BS degree in STEM or equivalent experience
Customer-focused, team-oriented disposition
Good systems debugging skills
Comfortable with the Linux environment or UNIX CLI
Experience with some programming or scripting language
Have experience creating processes, procedures and SOP documentation
General understanding of TCP/IP, HTTP, and related protocols
Initiative to take ownership of tasks and drive them to completion
Comfortable dealing with users with varying levels of IT knowledge
Eager to learn new technologies
Strong communication and soft skills to interface with customers, peers and management
Good judgement, sense of integrity, and responsibility

Preferred Experience and Skillsets

Kubernetes, OpenShift, Cloud or Linux experience
Experience with:
- Service Reliability Engineering in any capacity
- Linux systems administration
- Automated CI servers, especially TeamCity and/or GitLab
- Automation programming/scripting in any of: bash, Ruby, Python, Go, Java, Scala, Rust, C++, Perl
- Automated configuration management, such as Puppet, Ansible, Chef, bcfg2, cfengine, etc. (Puppet is preferred)
- Version control systems, especially git
- Service Mesh technologies (e.g., linkerd, Istio)
- Configuring or using monitoring and alerting technologies (TIGK stack, Grafana, Prometheus, OpsGenie)
- Confluence, Jira, and Microsoft Office suite
- GitOps tools, especially ArgoCD
- Google Anthos
Understanding of:
- Linux internals (system calls, file systems, processes, etc.)
- Linux network configuration
- Linux application containerization, especially Docker
- Attached network storage technologies
- Cloud computing environment such as AWS, GCP or Azure
- Automated CI/CD pipelines
- Distributed systems design principles

Benefits and Salary

Competitive benefits package that includes medical, dental and vision coverage, 401k plan with employer contribution, paid holidays, vacation, and tuition reimbursement
Competitive salary commensurate with experience and location. The targeted range for this position is $135,000 - $165,000

Skills

KubernetesGitLabLinuxDockerAnsiblePuppetPrometheusGrafanaPythonbashCI/CDGitOpsArgoCDIstioAWS

Similar roles at this salary range

All DevOps / SRE jobs →

Ai2

Jun 8

Senior Software Engineer, AI Infrastructure

Senior engineer building and operating large-scale HPC infrastructure for AI model training. Owns job scheduling, automation, and performance optimization across GPU clusters.

126k – 189kSeattle, WADevOps / SREOn-siteGoSRE

Aurelian

Jun 8

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Huntress

Jun 8

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby

Mozilla

Jun 8

Senior Site Reliability Engineer

Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.

123k – 144kUnited StatesDevOps / SRERemoteAWSIAM

Mozilla

Jun 8

Senior Site Reliability Engineer

Senior SRE to operate and evolve an EKS-based Kubernetes platform, CI/CD pipelines, and observability stack on AWS. Requires 7+ years infrastructure/SRE experience with production Kubernetes and IaC fluency.

123k – 144kUnited StatesDevOps / SRERemoteEKSAWS

Apply