Operations Support Engineer - Lead
Bethesda, MDHybrid
Summary
Leads operations support team resolving deployment issues, debugging microservices, and maintaining SLOs in a GitLab/Kubernetes DevOps platform for NCBI developers. Requires BS in STEM, Linux skills, scripting, and strong communication for on-call support and training.
About the role
Duties & Responsibilities
- Identify and resolve operational problems in a micro-service environment
- Work with developers to resolve deployment and runtime problems
- Perform analysis and debugging work across multiple technologies
- Prioritize issues to keep applications within error budgets and meeting their SLOs
- Provide technical solutions to a wide range of problems and user requests
- Document process, procedures and SOPs by soliciting feedback and suggestions from team members
- Compile postmortems and action items to minimize future outages
- Interview other people for team member roles, and decide which ones to recommend for hire
- Train new team members, and assist them with issues
- Provide on-call support to NCBI's internal developers and other staff
Requirements
- BS degree in STEM or equivalent experience
- Customer-focused, team-oriented disposition
- Good systems debugging skills
- Comfortable with the Linux environment or UNIX CLI
- Experience with some programming or scripting language
- Have experience creating processes, procedures and SOP documentation
- General understanding of TCP/IP, HTTP, and related protocols
- Initiative to take ownership of tasks and drive them to completion
- Comfortable dealing with users with varying levels of IT knowledge
- Eager to learn new technologies
- Strong communication and soft skills to interface with customers, peers and management
- Good judgement, sense of integrity, and responsibility
Preferred Experience / Skillsets
- Kubernetes, OpenShift, Cloud or Linux experience
- Experience with:
- Service Reliability Engineering in any capacity
- Linux systems administration
- Automated CI servers, especially TeamCity and/or GitLab
- Automation programming/scripting in any of: bash, Ruby, Python, Go, Java, Scala, Rust, C++, Perl
- Automated configuration management, such as Puppet, Ansible, Chef, bcfg2, cfengine, etc. (Puppet is preferred)
- Version control systems, especially git
- Service Mesh technologies (e.g., linkerd, Istio)
- Configuring or using monitoring and alerting technologies (TIGK stack, Grafana, Prometheus, OpsGenie)
- Confluence, Jira, and Microsoft Office suite
- GitOps tools, especially ArgoCD
- Google Anthos
- Understanding of:
- Linux internals (system calls, file systems, processes, etc.)
- Linux network configuration
- Linux application containerization, especially Docker
- Attached network storage technologies
- Cloud computing environment such as AWS, GCP or Azure
- Automated CI/CD pipelines
- Distributed systems design principles
Skills
KubernetesLinuxGitLabDockerPythonbashPuppetAnsiblePrometheusGrafanaAWSGCPIstioArgoCDJira