Skip to content

Operations Support Engineer - Lead

Bethesda, MDHybrid
Summary

Leads operations support team resolving deployment issues, debugging microservices, and maintaining SLOs in a GitLab/Kubernetes DevOps platform for NCBI developers. Requires BS in STEM, Linux skills, scripting, and strong communication for on-call support and training.

About the role

Duties & Responsibilities

  • Identify and resolve operational problems in a micro-service environment
  • Work with developers to resolve deployment and runtime problems
  • Perform analysis and debugging work across multiple technologies
  • Prioritize issues to keep applications within error budgets and meeting their SLOs
  • Provide technical solutions to a wide range of problems and user requests
  • Document process, procedures and SOPs by soliciting feedback and suggestions from team members
  • Compile postmortems and action items to minimize future outages
  • Interview other people for team member roles, and decide which ones to recommend for hire
  • Train new team members, and assist them with issues
  • Provide on-call support to NCBI's internal developers and other staff

Requirements

  • BS degree in STEM or equivalent experience
  • Customer-focused, team-oriented disposition
  • Good systems debugging skills
  • Comfortable with the Linux environment or UNIX CLI
  • Experience with some programming or scripting language
  • Have experience creating processes, procedures and SOP documentation
  • General understanding of TCP/IP, HTTP, and related protocols
  • Initiative to take ownership of tasks and drive them to completion
  • Comfortable dealing with users with varying levels of IT knowledge
  • Eager to learn new technologies
  • Strong communication and soft skills to interface with customers, peers and management
  • Good judgement, sense of integrity, and responsibility

Preferred Experience / Skillsets

  • Kubernetes, OpenShift, Cloud or Linux experience
  • Experience with:
    • Service Reliability Engineering in any capacity
    • Linux systems administration
    • Automated CI servers, especially TeamCity and/or GitLab
    • Automation programming/scripting in any of: bash, Ruby, Python, Go, Java, Scala, Rust, C++, Perl
    • Automated configuration management, such as Puppet, Ansible, Chef, bcfg2, cfengine, etc. (Puppet is preferred)
    • Version control systems, especially git
    • Service Mesh technologies (e.g., linkerd, Istio)
    • Configuring or using monitoring and alerting technologies (TIGK stack, Grafana, Prometheus, OpsGenie)
    • Confluence, Jira, and Microsoft Office suite
    • GitOps tools, especially ArgoCD
    • Google Anthos
  • Understanding of:
    • Linux internals (system calls, file systems, processes, etc.)
    • Linux network configuration
    • Linux application containerization, especially Docker
    • Attached network storage technologies
    • Cloud computing environment such as AWS, GCP or Azure
    • Automated CI/CD pipelines
    • Distributed systems design principles
Skills
KubernetesLinuxGitLabDockerPythonbashPuppetAnsiblePrometheusGrafanaAWSGCPIstioArgoCDJira