Skip to content

Senior Cloud Support Engineer

125k – 151kDallas, TXSupport EngineeringOnsite5+ YOE
Summary

Provide technical support for Crusoe Cloud's GPU compute platform, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud skills.

About the role

What You’ll Be Working On

Customer Support

  • Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)

On-Call Rotation

  • Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues

Troubleshooting

  • Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools

Alert Triage and Maintenance

  • Manage alert triage, prepare for maintenance windows, and conduct node delivery testing

Collaboration

  • Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery

Global Teamwork

  • Adhere to global team collaboration and handoff processes for ticketing and on-call procedures

Knowledge Sharing

  • Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)

What You’ll Bring to the Team

Education/Experience

  • Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience

Linux Proficiency

  • Strong command-line interface (CLI) skills in Linux environments

Version Control

  • Proficiency with Git for code management and collaboration

Customer Support Experience

  • 5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments

Cloud Technologies

  • Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana)

Public Cloud Knowledge

  • Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP)

Communication Skills

  • Excellent communication and customer service skills, including the ability to prioritize competing escalations

HPC Knowledge

  • Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN)

Bonus Points

Certifications

  • CKA, CKAD, CKS, KCNA, AWS Machine Learning - Specialty, Data Analytics - Specialty, Solutions Architect - Professional, Developer - Associate, NVIDIA AI Infrastructure and Operations, Generative AI and LLMs, Generative AI Multi-modal, Infiniband, Linux Foundation IT Associate, System Administrator

Cloud Expertise

  • Deep understanding of specific cloud platforms and services

Automation Skills

  • Experience with automation tools and scripting languages

Problem-Solving Abilities

  • Demonstrated ability to analyze complex technical issues and develop effective solutions

Collaboration and Mentorship

  • Proven ability to mentor, train, and onboard colleagues

Passion for Sustainability

  • A strong interest in contributing to a more sustainable future through technology

Benefits

  • Competitive compensation
  • Restricted Stock Units
  • Paid time off & paid holidays
  • Comprehensive health, dental & vision insurance
  • Employer contributions to HSA account
  • Paid parental leave
  • Paid life insurance, short-term and long-term disability
  • Professional development & tuition reimbursement
  • Mental health & wellness support
  • Commuter benefits (parking & transit)
  • Cell phone stipend
  • 401(k) Retirement plan with company match up to 4% of salary
  • Volunteer time off

Compensation

  • $125,000 - $151,000 + Bonus
  • Restricted Stock Units included in all offers
Skills
LinuxGitKubernetesTerraformSlurmGrafanaAWSAzureGCPInfinibandRDMARoCESDNZendesk
Similar roles at this salary range
All Support Engineering jobs →
Metriport

Customer Support Engineer

First Customer Support Engineer responsible for triaging and resolving technical customer issues end-to-end, building automated support infrastructure, and bridging to engineering and product teams. Requires 3+ years technical support or engineering experience with production code and API familiarity.

120k – 160kSan Francisco, CASupport EngineeringOn-site3+ YOEAPIsWebhooks
Astronomer

Customer Reliability Engineer, Airflow

Provide Apache Airflow expertise and solve complex data engineering issues for enterprise customers on Astronomer's managed Airflow platform. Requires 4+ years Python, 1+ year Airflow admin/DAG experience, Kubernetes, and cloud platform experience.

125k – 130kSan Francisco, CA +10Support EngineeringRemote4+ YOEAWSGCP
Astronomer

Customer Reliability Engineer - Infrastructure

Infrastructure-focused Customer Reliability Engineer supporting Astronomer's managed Airflow platform. Troubleshoots customer cloud/K8s environments, owns monitoring/alerting, participates in on-call, and drives reliability improvements across AWS, GCP, and Azure.

125k – 130kSan Francisco, CA +9Support EngineeringRemote5+ YOEAWSGCP
Crusoe

Senior Cloud Support Engineer

Provide technical support for Crusoe Cloud's GPU infrastructure, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud skills.

125k – 151kDenver, COSupport EngineeringOn-site5+ YOEGitAWS
Crusoe

Senior Cloud Support Engineer

Provide technical support for Crusoe Cloud's GPU compute platform, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud/HPC skills.

145k – 175kSeattle, WA +1Support EngineeringOn-site5+ YOEGitAWS