Skip to content

Software Engineer, Collective Communication

380k – 555kSan Francisco, CAHybrid
Summary

Designs and implements custom networking collectives using C++ and CUDA for efficient AI model training on supercomputers. Requires expertise in low-level performance-critical software, RDMA distributed algorithms, and GPU/CPU code.

About the role

Responsibilities

  • Collaborate closely with ML researchers to design and implement efficient collective operations in C++ and CUDA.
  • Ensure that our largest training jobs take full advantage of the different network transports used in our supercomputers.
  • Work on simulations to inform our future supercomputer network designs.

Requirements

  • Background in low level performance critical software.
  • Have written distributed algorithms using RDMA in the past.
  • Comfortable writing low level performance sensitive CPU and/or GPU code.
  • Familiar with network simulation techniques.

Nice-to-haves

  • Experience with collective communication.
Skills
C++CUDARDMAdistributed algorithmsnetwork simulationcollective communicationGPU programmingnetwork transports
Similar roles at this salary range
All Backend Engineering jobs →
Anthropic

Staff + Senior Software Engineer, Cloud Inference

The Cloud Inference team is seeking Staff/Senior Software Engineers to scale and optimize Claude across multiple cloud service providers. This role involves designing, building, and owning backend services and infrastructure, collaborating cross-functionally, and ensuring reliable and cost-effective inference management at massive scale.

320k – 485kSan Francisco, CABackend EngineeringHybridRustCI/CD
Thinking Machines Lab

Software Engineer, Platform, Tinker

Builds platform systems for AI fine-tuning API including billing, metering, authorization (RBAC/OAuth), organizations/teams, data exports, and audit logging. Requires backend proficiency in Python/Rust and experience in billing, access control, or multi-tenant systems; bachelor's or equivalent.

350k – 475kSan Francisco, CA +1Backend EngineeringOn-siteSSORust
OpenAI

Software Engineer, Foundations Retrieval

Software Engineer building and scaling retrieval infrastructure for AI models, including indexing, serving, and query execution systems. Partners with researchers to productionize embedding techniques and supports agentic workflows across OpenAI products.

380k – 555kSan Francisco, CABackend EngineeringOn-siteEmbeddingsML Systems
Anthropic

Staff+ Software Engineer, Backend

Staff backend engineer owns scalable systems for Anthropic's API, Claude.ai, and developer tools across teams like API Core and Enterprise Foundations. Requires 8+ years experience leading complex projects, distributed systems, and cross-team alignment in fast-paced AI environment.

405k – 485kSan Francisco, CA +2Backend EngineeringHybridGoAWS
OpenAI

Principal Software Engineer, B2B Engineering

Leads architecture and scaling of backend services, APIs, distributed systems, databases, and data pipelines for OpenAI's developer platform and enterprise products. Requires deep expertise in backend engineering, reliability, security, and cross-team collaboration in fast-paced environments.

385k – 490kSan Francisco, CABackend EngineeringOn-siteGoRust