Associate Systems Software Engineer

137k – 161kSan Francisco, CAOnsiteEntry levelApr 8

Summary

Develops Linux-based compute applications for managing virtualization stacks across AI compute servers, integrates with AI hardware like GPUs and NICs, and optimizes performance for AI/ML workloads in datacenters. Requires Linux kernel familiarity, systems programming, and hardware integration skills.

About the role

What You’ll Be Working On

Compute Application Development & Scaleout: Design highly reliable and performant Linux applications used to manage our virtualization stack across thousands of AI compute servers in multiple global datacenters.
AI Hardware Platform Integration: Integrate Crusoe applications with a wide variety of hardware and software AI chip-vendor stacks. Build solutions to optimize and monitor virtualized hardware (GPUs, Infiniband/ROCe NICs, Ephemeral Storage, etc.) in cutting-edge AI/HPC environments.
Kernel & Hypervisor Integration: Work side by side with our Linux Kernel and Hypervisor teams to ensure our Crusoe applications are seamlessly integrated with a variety of kernels and hypervisors.
Performance Analysis & Tuning: Analyze and enhance the performance of the entire virtualization stack, from the hypervisor to the virtualized guest OS, with a specific focus on optimizing AI/ML workloads. This includes profiling, bottleneck identification, and implementing low-level optimizations.
System-Level Troubleshooting: Diagnose and resolve complex system issues across our virtualization stack (drivers, kernel, hypervisor, guest OS, and crusoe applications). Work closely with kernel and hypervisor teams to debug and resolve integration challenges.
Code Review and Quality Assurance: Conduct thorough code reviews to ensure the highest level of software quality, reliability, and security within compute applications and virtualization stack.
Cross-Functional Collaboration: Collaborate with other engineering teams, including hardware design, OS development, and AI/ML application teams, to ensure cohesive and integrated product development.
Technical Leadership: Provide technical guidance and mentorship to junior engineers, fostering a culture of technical excellence and collaborative problem-solving within the compute applications team.

What You’ll Bring to the Team

Linux Systems Familiarity: Experience building applications on Linux kernels, specifically pertaining to virtualization, device drivers, memory management, and process scheduling.
Hardware Integration: Solid understanding of hardware devices such as GPUs, CPUs, Infiniband and Ethernet NICs, Ephemeral Disks, and PCI Express.
Systems Design: Strong grasp of distributed applications and highly-scalable systems design. Specific focus around communications protocols (GRPC, REST, TCP/IP, etc.), databases (Postgres, Redis), and systems design applications (Pub/Sub, Kafka).
Software Architecture: Strong experience building software applications, both at the higher (Golang, Java, Python) and lower (C, C++, Rust) levels. Keen eye for clean, maintainable code, and a unit-test driven mindset.
Excellent Communication Skills: Ability to collaborate with teams across an organization, blocking out noise, and focusing on what needs to get done to get a project across the line.
Rapid and Agile Learner: Capable of adapting quickly, eager to research new technology and not get overwhelmed by unfamiliar tech stacks.
Virtualization Concepts: General knowledge of hypervisors, virtual machine lifecycles, and Linux KVM tooling.
CI/CD and Validation: Understanding of how to build Gitlab or Github CI/CD pipelines that deliver bug-free code across a multitude of compute platforms.

Bonus Points

Experience with virtualization specifically for AI/ML workloads, including GPU virtualization.
Previous work debugging or contributing to kernel or hypervisor code, particularly around device management.
Experience with configuring thousands of live compute nodes in a bare-metal production environment.

Benefits

Competitive compensation
Restricted Stock Units
Paid time off & paid holidays
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary

Compensation

Compensation will be paid in the range of $137,000 - $161,000. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Skills

LinuxvirtualizationLinux KVMGolangCC++RustGPUsInfinibandKubernetesgRPCPostgresRedisKafkaGitlab CI/CD

Similar roles at this salary range

All DevOps / SRE jobs →

Ai2

Jun 8

Senior Software Engineer, AI Infrastructure

Senior engineer building and operating large-scale HPC infrastructure for AI model training. Owns job scheduling, automation, and performance optimization across GPU clusters.

126k – 189kSeattle, WADevOps / SREOn-siteGoSRE

Aurelian

Jun 8

Senior Infrastructure Engineer

Build analytics infrastructure, observability tooling, and developer platforms to support real-time AI agents for 911 centers. Requires 4+ years infrastructure/platform/backend experience and comfort across the full stack.

150k – 200kSeattle, WADevOps / SREOn-siteLoggingClickHouse

Huntress

Jun 8

Senior Developer Experience Engineer

Senior Platform Engineer focused on Developer Experience building tools, automation, CI/CD systems, and AI tooling to improve developer productivity and workflows. Requires 7+ years cloud experience, containerization, and proficiency in Ruby, Go, or Python.

160k – 190kUnited StatesDevOps / SRERemoteGoRuby

Mozilla

Jun 8

Senior Site Reliability Engineer

Senior SRE to operate and evolve EKS Kubernetes platform, CI/CD pipelines, and observability stack for Thunderbird's open-source infrastructure. Requires 7+ years infrastructure experience and strong production Kubernetes and IaC skills.

123k – 144kUnited StatesDevOps / SRERemoteAWSIAM

Mozilla

Jun 8

Senior Site Reliability Engineer

Senior SRE to operate and evolve an EKS-based Kubernetes platform, CI/CD pipelines, and observability stack on AWS. Requires 7+ years infrastructure/SRE experience with production Kubernetes and IaC fluency.

123k – 144kUnited StatesDevOps / SRERemoteEKSAWS

Apply