Customer Reliability Engineer - Infrastructure
Infrastructure-focused Customer Reliability Engineer supporting Astronomer's managed Airflow platform. Troubleshoots customer cloud/K8s environments, owns monitoring/alerting, participates in on-call, and drives reliability improvements across AWS, GCP, and Azure.
What you get to do
- Provide solutions to customers to make them successful using our products
- Troubleshoot customer environments and engage in active triaging with customers
- Participate in on-call rotation for weekend coverage
- Provide feedback to the product development teams on customer needs and pain points
- Build out our monitoring and alerting systems
- Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible
- Help direct the architecture of the products and contribute where possible
- Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide "white glove" guidance on the path to production
- Participate remotely within a fully distributed team
- Enhance and enrich customer documentation
- Work with the latest technology and multi-cloud implementations
What you bring to the role
- 5 years of experience, preferably with large, complex cloud infrastructures operating at scale
- 3 years of experience with Kubernetes
- Experience managing a Production distributed system with at least one major cloud provider (AWS, GCP, Azure)
- Strong Linux experience
- Knowledge of how to operate and monitor issues for distributed systems
- Previous experience in handling customers issues (internal or external)
- Strong communication skills
- DevOps or CI/CD experience
- Python scripting
- Good troubleshooting skills
Bonus points if you have
- Experience as a Site Reliability Engineer
- Worked with Kubernetes Custom Resources
- Depth of knowledge with Azure
- Airflow/Big Data Orchestration experience
- IaC experience
Compensation
- Estimated total compensation: $125,000 - $130,000 based on leveling and geography, along with an equity component and a comprehensive benefits package
Customer Support Engineer
First Customer Support Engineer responsible for triaging and resolving technical customer issues end-to-end, building automated support infrastructure, and bridging to engineering and product teams. Requires 3+ years technical support or engineering experience with production code and API familiarity.
Customer Reliability Engineer, Airflow
Provide Apache Airflow expertise and solve complex data engineering issues for enterprise customers on Astronomer's managed Airflow platform. Requires 4+ years Python, 1+ year Airflow admin/DAG experience, Kubernetes, and cloud platform experience.
Senior Cloud Support Engineer
Provide technical support for Crusoe Cloud's GPU compute platform, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud skills.
Senior Cloud Support Engineer
Provide technical support for Crusoe Cloud's GPU infrastructure, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud skills.
Senior Cloud Support Engineer
Provide technical support for Crusoe Cloud's GPU compute platform, troubleshooting VMs, hardware, and scaling issues while participating in 24/7 on-call rotations. Requires 5+ years customer support experience and strong Linux/cloud/HPC skills.