Senior Cloud Support Engineer
Provide technical support for Crusoe Cloud's GPU compute platform, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud skills.
What You’ll Be Working On
Customer Support
- Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)
On-Call Rotation
- Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues
Troubleshooting
- Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools
Alert Triage and Maintenance
- Manage alert triage, prepare for maintenance windows, and conduct node delivery testing
Collaboration
- Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery
Global Teamwork
- Adhere to global team collaboration and handoff processes for ticketing and on-call procedures
Knowledge Sharing
- Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)
What You’ll Bring to the Team
Education/Experience
- Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience
Linux Proficiency
- Strong command-line interface (CLI) skills in Linux environments
Version Control
- Proficiency with Git for code management and collaboration
Customer Support Experience
- 5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments
Cloud Technologies
- Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana)
Public Cloud Knowledge
- Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP)
Communication Skills
- Excellent communication and customer service skills, including the ability to prioritize competing escalations
HPC Knowledge
- Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN)
Bonus Points
Certifications
- CKA, CKAD, CKS, KCNA, AWS Machine Learning - Specialty, Data Analytics - Specialty, Solutions Architect - Professional, Developer - Associate, NVIDIA AI Infrastructure and Operations, Generative AI and LLMs, Generative AI Multi-modal, Infiniband, Linux Foundation IT Associate, System Administrator
Cloud Expertise
- Deep understanding of specific cloud platforms and services
Automation Skills
- Experience with automation tools and scripting languages
Problem-Solving Abilities
- Demonstrated ability to analyze complex technical issues and develop effective solutions
Collaboration and Mentorship
- Proven ability to mentor, train, and onboard colleagues
Passion for Sustainability
- A strong interest in contributing to a more sustainable future through technology
Benefits
- Competitive compensation
- Restricted Stock Units
- Paid time off & paid holidays
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off
Compensation
- $125,000 - $151,000 + Bonus
- Restricted Stock Units included in all offers
Customer Support Engineer
First Customer Support Engineer responsible for triaging and resolving technical customer issues end-to-end, building automated support infrastructure, and bridging to engineering and product teams. Requires 3+ years technical support or engineering experience with production code and API familiarity.
Customer Reliability Engineer, Airflow
Provide Apache Airflow expertise and solve complex data engineering issues for enterprise customers on Astronomer's managed Airflow platform. Requires 4+ years Python, 1+ year Airflow admin/DAG experience, Kubernetes, and cloud platform experience.
Customer Reliability Engineer - Infrastructure
Infrastructure-focused Customer Reliability Engineer supporting Astronomer's managed Airflow platform. Troubleshoots customer cloud/K8s environments, owns monitoring/alerting, participates in on-call, and drives reliability improvements across AWS, GCP, and Azure.
Senior Cloud Support Engineer
Provide technical support for Crusoe Cloud's GPU infrastructure, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud skills.
Senior Cloud Support Engineer
Provide technical support for Crusoe Cloud's GPU compute platform, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud/HPC skills.