Performance Modeling Engineer
Develops and maintains performance modeling tools and frameworks to evaluate AI system behavior, analyze tradeoffs in compute, memory, networking, and storage. Collaborates with architects on simulations and insights for infrastructure design; requires strong software/modeling background and system architecture knowledge.
Key Responsibilities
- Develop and maintain performance modeling tools and frameworks.
- Build models to evaluate system behavior across: compute, memory, and interconnect subsystems; distributed system scaling and bottlenecks.
- Run simulations and analytical models to support architectural tradeoff analysis.
- Collaborate with performance modeling lead and system architects to answer forward-looking design questions.
- Analyze and interpret modeling outputs, translating results into actionable insights.
- Validate models against real system measurements and workload behavior.
- Contribute to improving modeling fidelity, usability, and scalability.
Qualifications
- Strong software engineering or modeling background (e.g., simulation, systems modeling, or performance analysis).
- Familiarity with system architecture fundamentals (compute, memory, networking).
- Experience with programming and building technical tools or frameworks.
- Ability to reason about performance bottlenecks and scaling behavior.
- Strong analytical skills and comfort working with quantitative models.
- Ability to collaborate across teams and learn new system domains quickly.
Preferred Skills
- Exposure to AI/ML workloads or distributed systems.
- Experience with simulation tools, performance modeling, or systems analysis.
- Familiarity with data center infrastructure or large-scale systems.
- Experience working with performance data, benchmarking, or profiling tools.
- Interest in system architecture and hardware/software co-design.
Principal Infrastructure Engineer
Principal Infrastructure Engineer building and operating secure cloud-native and edge platforms for military collaboration software. Requires 8+ years production infrastructure experience, deep Kubernetes expertise, and ability to obtain SECRET clearance.
Staff Engineer, Distributed Storage and HPC & AI Infrastructure
Design and operate multi-petabyte distributed storage systems for large-scale AI training and inference, integrating parallel filesystems and building Kubernetes-native storage platforms.
Director of Platform & Reliability Engineering
The Director of Platform & Reliability Engineering will lead an engineering organization responsible for secure, scalable, and highly reliable products. This role involves setting the vision for internal platforms, cloud infrastructure, developer enablement, and production operations.
Staff Site Reliability Engineer
Zoox is seeking a Staff Site Reliability Engineer to lead source control, owning the technical strategy and roadmap for their Git-based monorepo. This role involves migrating from GitHub Enterprise to GitHub Cloud, building developer tooling, and partnering with various teams to enhance source control as a strategic asset.