Associate Systems Software Engineer
Develops Linux-based compute applications for managing virtualization stacks across AI compute servers, integrates with AI hardware like GPUs and NICs, and optimizes performance for AI/ML workloads in datacenters. Requires Linux kernel familiarity, systems programming, and hardware integration skills.
What You’ll Be Working On
- Compute Application Development & Scaleout: Design highly reliable and performant Linux applications used to manage our virtualization stack across thousands of AI compute servers in multiple global datacenters.
- AI Hardware Platform Integration: Integrate Crusoe applications with a wide variety of hardware and software AI chip-vendor stacks. Build solutions to optimize and monitor virtualized hardware (GPUs, Infiniband/ROCe NICs, Ephemeral Storage, etc.) in cutting-edge AI/HPC environments.
- Kernel & Hypervisor Integration: Work side by side with our Linux Kernel and Hypervisor teams to ensure our Crusoe applications are seamlessly integrated with a variety of kernels and hypervisors.
- Performance Analysis & Tuning: Analyze and enhance the performance of the entire virtualization stack, from the hypervisor to the virtualized guest OS, with a specific focus on optimizing AI/ML workloads. This includes profiling, bottleneck identification, and implementing low-level optimizations.
- System-Level Troubleshooting: Diagnose and resolve complex system issues across our virtualization stack (drivers, kernel, hypervisor, guest OS, and crusoe applications). Work closely with kernel and hypervisor teams to debug and resolve integration challenges.
- Code Review and Quality Assurance: Conduct thorough code reviews to ensure the highest level of software quality, reliability, and security within compute applications and virtualization stack.
- Cross-Functional Collaboration: Collaborate with other engineering teams, including hardware design, OS development, and AI/ML application teams, to ensure cohesive and integrated product development.
- Technical Leadership: Provide technical guidance and mentorship to junior engineers, fostering a culture of technical excellence and collaborative problem-solving within the compute applications team.
What You’ll Bring to the Team
- Linux Systems Familiarity: Experience building applications on Linux kernels, specifically pertaining to virtualization, device drivers, memory management, and process scheduling.
- Hardware Integration: Solid understanding of hardware devices such as GPUs, CPUs, Infiniband and Ethernet NICs, Ephemeral Disks, and PCI Express.
- Systems Design: Strong grasp of distributed applications and highly-scalable systems design. Specific focus around communications protocols (GRPC, REST, TCP/IP, etc.), databases (Postgres, Redis), and systems design applications (Pub/Sub, Kafka).
- Software Architecture: Strong experience building software applications, both at the higher (Golang, Java, Python) and lower (C, C++, Rust) levels. Keen eye for clean, maintainable code, and a unit-test driven mindset.
- Excellent Communication Skills: Ability to collaborate with teams across an organization, blocking out noise, and focusing on what needs to get done to get a project across the line.
- Rapid and Agile Learner: Capable of adapting quickly, eager to research new technology and not get overwhelmed by unfamiliar tech stacks.
- Virtualization Concepts: General knowledge of hypervisors, virtual machine lifecycles, and Linux KVM tooling.
- CI/CD and Validation: Understanding of how to build Gitlab or Github CI/CD pipelines that deliver bug-free code across a multitude of compute platforms.
Bonus Points
- Experience with virtualization specifically for AI/ML workloads, including GPU virtualization.
- Previous work debugging or contributing to kernel or hypervisor code, particularly around device management.
- Experience with configuring thousands of live compute nodes in a bare-metal production environment.
Benefits
- Competitive compensation
- Restricted Stock Units
- Paid time off & paid holidays
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
Compensation
Compensation will be paid in the range of $137,000 - $161,000. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Senior Infrastructure Engineer
Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.
Senior Developer Experience Engineer
Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.
Senior Site Reliability Engineer
Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.