Software Engineer, Collective Communication
Designs and implements custom networking collectives using C++ and CUDA for efficient AI model training on supercomputers. Requires expertise in low-level performance-critical software, RDMA distributed algorithms, and GPU/CPU code.
Responsibilities
- Collaborate closely with ML researchers to design and implement efficient collective operations in C++ and CUDA.
- Ensure that our largest training jobs take full advantage of the different network transports used in our supercomputers.
- Work on simulations to inform our future supercomputer network designs.
Requirements
- Background in low level performance critical software.
- Have written distributed algorithms using RDMA in the past.
- Comfortable writing low level performance sensitive CPU and/or GPU code.
- Familiar with network simulation techniques.
Nice-to-haves
- Experience with collective communication.
Staff + Senior Software Engineer, Cloud Inference
The Cloud Inference team is seeking Staff/Senior Software Engineers to scale and optimize Claude across multiple cloud service providers. This role involves designing, building, and owning backend services and infrastructure, collaborating cross-functionally, and ensuring reliable and cost-effective inference management at massive scale.
Software Engineer, Platform, Tinker
Builds platform systems for AI fine-tuning API including billing, metering, authorization (RBAC/OAuth), organizations/teams, data exports, and audit logging. Requires backend proficiency in Python/Rust and experience in billing, access control, or multi-tenant systems; bachelor's or equivalent.
Software Engineer, Foundations Retrieval
Software Engineer building and scaling retrieval infrastructure for AI models, including indexing, serving, and query execution systems. Partners with researchers to productionize embedding techniques and supports agentic workflows across OpenAI products.
Staff+ Software Engineer, Backend
Staff backend engineer owns scalable systems for Anthropic's API, Claude.ai, and developer tools across teams like API Core and Enterprise Foundations. Requires 8+ years experience leading complex projects, distributed systems, and cross-team alignment in fast-paced AI environment.
Principal Software Engineer, B2B Engineering
Leads architecture and scaling of backend services, APIs, distributed systems, databases, and data pipelines for OpenAI's developer platform and enterprise products. Requires deep expertise in backend engineering, reliability, security, and cross-team collaboration in fast-paced environments.