ChatGPT Performance Engineer
Performance Engineer optimizes infrastructure and application performance for ChatGPT and OpenAI API, focusing on latency, throughput, and efficiency at scale. Requires 7+ years in high-scale systems with expertise in profiling, tracing, and cross-layer optimizations.
Responsibilities
- Analyze and optimize performance across application, middleware, runtime, and infrastructure layers—networking, storage, Python runtime, GPU utilization, and beyond.
- Develop tooling and metrics that provide deep observability into system performance.
- Collaborate closely with infra, platform, training, and product teams to identify key performance goals and drive systemic improvements.
- Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale.
- Lead investigations into high-impact performance regressions or scalability issues in production.
- Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems.
Requirements
- 7+ years of experience in software engineering with a strong track record in performance or reliability of high-scale distributed systems.
- Deeply comfortable with performance profiling tools and tracing systems.
- Experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Python/Golang internals, GPU utilization).
- Strong understanding of OS internals, scheduling, memory management, and IO patterns.
- Contributed to observability, benchmarking, or performance-focused infrastructure at scale.
- Demonstrated success navigating ambiguity and aligning stakeholders around performance goals.
- Value simplicity, rigor, and collaboration when solving complex systems problems.
Staff Software Engineer, Infrastructure Asset Systems
As a Staff Software Engineer, you will build and extend systems for tracking, governing, and reporting on infrastructure assets. This involves designing data models, workflow engines, and integrations with financial and procurement systems, ensuring compliance and auditability.
Senior Manager, Network Engineering & Infrastructure
Lead and mentor a network engineering team responsible for designing, deploying, and operating multi-site enterprise network infrastructure across data centers, cloud, offices, and vehicle facilities. Requires 10+ years of network experience with 5+ years in senior leadership.
Performance Engineer, Inference Systems
Performance engineer focused on cross-layer investigations of Anthropic's inference fleet for Claude, optimizing throughput, latency, reliability, and correctness while building observability and partnering with kernel and serving teams.
Tech Lead, Deployment & Operations — Custom Infrastructure
Lead deployment and operations for OpenAI’s custom silicon and systems into data center environments. Drive hardware bring-up, validation, production deployment, and fleet reliability at scale while leading a technical team.
Staff Fiber Network Engineer
Owns end-to-end physical layer of private global dark-fiber backbone network, including route design, fiber acquisition, vendor management, acceptance testing, and lifecycle management. Requires deep OSP/fiber expertise, optical transport knowledge, and 8+ years experience building fiber programs.