# Member of Technical Staff, AI Training Infrastructure
**Company:** [Fireworks AI](https://hotfix.jobs/companies/fireworks-ai)
**Location:** San Mateo, CA
**Salary:** $175K-$220K
**Skills:** Python, C++, Kubernetes, Distributed Systems, Gpu Programming, PyTorch, TensorFlow, MLOps, AWS, GCP
**Posted:** 2026-03-05
> Builds and optimizes scalable infrastructure for AI model training on large GPU clusters. Requires expertise in distributed systems, Python/C++, and ML frameworks.
## Job Description
## Responsibilities

- Design, build, and maintain scalable infrastructure for AI model training pipelines.
- Optimize distributed training systems for large-scale GPU clusters.
- Collaborate with AI researchers and engineers to deploy training workflows efficiently.

## Requirements

- Strong experience with distributed systems and high-performance computing.
- Proficiency in Python, C++, and infrastructure-as-code tools.
- Deep knowledge of AI training frameworks and GPU optimization.

## Nice-to-Haves

- Experience with Kubernetes and cloud platforms like AWS or GCP.
- Background in machine learning operations (MLOps).
**Apply:** https://hotfix.jobs/jobs/member-of-technical-staff-ai-training-infrastructure-at-fireworks-ai-3b60d4a2-35a8-4163-8505-a2811ac7ceae
**Canonical:** https://hotfix.jobs/jobs/member-of-technical-staff-ai-training-infrastructure-at-fireworks-ai-3b60d4a2-35a8-4163-8505-a2811ac7ceae