Skip to content

Research Engineer - CUDA Kernel Engineering

Develops and optimizes CUDA kernels for AI models accelerating semiconductor design, verification, and chip optimization across large-scale GPU clusters. Integrates with frameworks like PyTorch and latest NVIDIA hardware for training, inference, and RL workloads.

Palo Alto, CABackend EngineeringOnsite

About the role

Responsibilities

  • Develop, integrate, and optimize state-of-the-art CUDA kernels to power AI models accelerating semiconductor design and verification.
  • Enable large-scale model training, inference, and reinforcement learning systems for circuit layouts, RTL generation/validation, and chip architecture optimization across thousands of GPUs.
  • Build tools, performance benchmarks, and integration layers to push GPU utilization limits for compute-intensive workloads in AI-driven hardware design.
  • Collaborate with researchers and engineers; contribute kernels and tooling to open-source AI and HPC ecosystems.

Requirements

  • Writing and optimizing CUDA kernels for large-scale AI workloads (attention, routing, graph-based operations, physics-inspired operators).
  • Profiling and optimizing GPU performance for custom compute or memory-bound workloads.
  • Integrating custom kernels into training and inference frameworks (PyTorch, Megatron, vLLM, TorchTitan).
  • Working with latest NVIDIA hardware/software (Hopper, Blackwell, NVLink, NCCL, Triton).
  • Building GPU-accelerated primitives for graph reasoning, symbolic computation, or hardware simulation.
  • Collaborating with AI researchers and semiconductor experts to translate domain-specific workloads into high-performance GPU code.

Skills

CUDAPyTorchNvidia HopperNvidia BlackwellNvlinkNcclTriton Inference ServerMegatronvLLMTorchtitan

Backend Engineer

Backend engineer on the Data Platform team building scalable, resilient distributed services for large-scale data integration, event processing, and platform extensions. Requires 3+ years backend experience and expertise with distributed systems, messaging, and NoSQL technologies.

Lehi, UTBackend EngineeringRemote3+ YOECGo

Software Engineer, Verifications Platform

Design and build backend services powering automated verification workflows, financial data integrations, and approval decisioning for lending products. Requires 3+ years building distributed systems in Kotlin or Java.

142k – 197kUnited StatesBackend EngineeringRemote3+ YOEJavaAPIs

Software Engineer

Design and build cloud backend microservices for reliable robot-to-cloud communication, fleet management, and telemetry. Requires 4+ years experience and proficiency in TypeScript, Java, or Python.

153k – 230kFoster City, CABackend EngineeringHybrid4+ YOEJavaRest

Software Engineer, Risk

Build and evolve Chime's risk platform and architecture as a backend-focused engineer on the Trust and Safety team. Requires 3+ years of production software experience and Ruby on Rails or comparable frameworks.

133k – 184kChicago, ILBackend EngineeringHybrid3+ YOEMonitoringDashboards

Software Engineer, Open Source

Core maintainer of the CrewAI open-source Python framework. Designs and maintains agent orchestration APIs, reviews community contributions, and upholds engineering quality in public.

San Francisco, CABackend EngineeringOn-siteUvLLMs