Skip to content

Platform Ops Lead

135k – 165kBethesda, MDRemote
Summary

Leads platform operations team supporting developers on GitLab and Kubernetes-based DevOps platform. Resolves deployment issues, manages on-call support, trains team members, and ensures SLOs in microservices environment. Requires BS in STEM, Linux skills, and scripting experience.

About the role

Duties and Responsibilities

  • Identify and resolve operational problems in a micro-service environment
  • Work with developers to resolve deployment and runtime problems
  • Perform analysis and debugging work across multiple technologies
  • Prioritize issues to keep applications within error budgets and meeting their SLOs
  • Provide technical solutions to a wide range of problems and user requests
  • Document processes, procedures and SOPs by soliciting feedback and suggestions from team members
  • Compile postmortems and action items to minimize future outages
  • Interview other people for team member roles, and decide which ones to recommend for hire
  • Train new team members, and assist them with issues
  • Provide on-call support to NCBI's internal developers and other staff

Requirements

  • BS degree in STEM or equivalent experience
  • Customer-focused, team-oriented disposition
  • Good systems debugging skills
  • Comfortable with the Linux environment or UNIX CLI
  • Experience with some programming or scripting language
  • Have experience creating processes, procedures and SOP documentation
  • General understanding of TCP/IP, HTTP, and related protocols
  • Initiative to take ownership of tasks and drive them to completion
  • Comfortable dealing with users with varying levels of IT knowledge
  • Eager to learn new technologies
  • Strong communication and soft skills to interface with customers, peers and management
  • Good judgement, sense of integrity, and responsibility

Preferred Experience and Skillsets

  • Kubernetes, OpenShift, Cloud or Linux experience

  • Experience with:

    • Service Reliability Engineering in any capacity
    • Linux systems administration
    • Automated CI servers, especially TeamCity and/or GitLab
    • Automation programming/scripting in any of: bash, Ruby, Python, Go, Java, Scala, Rust, C++, Perl
    • Automated configuration management, such as Puppet, Ansible, Chef, bcfg2, cfengine, etc. (Puppet is preferred)
    • Version control systems, especially git
    • Service Mesh technologies (e.g., linkerd, Istio)
    • Configuring or using monitoring and alerting technologies (TIGK stack, Grafana, Prometheus, OpsGenie)
    • Confluence, Jira, and Microsoft Office suite
    • GitOps tools, especially ArgoCD
    • Google Anthos
  • Understanding of:

    • Linux internals (system calls, file systems, processes, etc.)
    • Linux network configuration
    • Linux application containerization, especially Docker
    • Attached network storage technologies
    • Cloud computing environment such as AWS, GCP or Azure
    • Automated CI/CD pipelines
    • Distributed systems design principles

Benefits and Salary

  • Competitive benefits package that includes medical, dental and vision coverage, 401k plan with employer contribution, paid holidays, vacation, and tuition reimbursement
  • Competitive salary commensurate with experience and location. The targeted range for this position is $135,000 - $165,000
Skills
KubernetesGitLabLinuxDockerAnsiblePuppetPrometheusGrafanaPythonbashCI/CDGitOpsArgoCDIstioAWS
Similar roles at this salary range
All DevOps / SRE jobs →
Ai2

Senior Software Engineer, AI Infrastructure

Senior engineer building and operating large-scale HPC infrastructure for AI model training. Owns job scheduling, automation, and performance optimization across GPU clusters.

126k – 189kSeattle, WADevOps / SREOn-siteGoSRE
Aurelian

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse
Huntress

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby
Mozilla

Senior Site Reliability Engineer

Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.

123k – 144kUnited StatesDevOps / SRERemoteAWSIAM
Mozilla

Senior Site Reliability Engineer

Senior SRE to operate and evolve an EKS-based Kubernetes platform, CI/CD pipelines, and observability stack on AWS. Requires 7+ years infrastructure/SRE experience with production Kubernetes and IaC fluency.

123k – 144kUnited StatesDevOps / SRERemoteEKSAWS