AI System Research and Development Engineer - Optimization

Develop and optimize GPU kernels and deep learning systems for LLM training and inference at Snowflake AI Research. Requires 5+ years in GPU/HPC optimization and strong proficiency in PyTorch, TensorFlow, JAX, and CUDA.

200k – 265kBellevue, WAML EngineeringOnsite5+ YOE

Apply

About the role

Responsibilities

Analyze and optimize GPU kernel performance for training and inference of LLMs
Develop and implement strategies to enhance the efficiency and scalability of deep learning systems
Profile and benchmark deep learning systems using tools and techniques to identify bottlenecks
Design and implement optimizations to reduce latency and improve resource utilization for training and inference
Stay updated with the latest advancements in GPU kernel optimization, deep learning, and LLM system development
Contribute to the development of agentic frameworks and applications for LLM-driven workflows, enhancing automation, reasoning, and decision-making capabilities
Open-source and publish innovations, optimizations, and engineering practices in technical blogs, top-tier conferences and journals

Requirements

Bachelor’s degree in Computer Science, Electrical Engineering, or a related field (Master’s degree or PhD preferred)
5 years of experience in GPU kernel optimization, deep learning system optimization, or high-performance computing (HPC)
Proficiency in deep learning frameworks such as PyTorch, TensorFlow, JAX
Strong understanding of GPU architectures and experience with CUDA or similar frameworks
Experience with frameworks like CUTLASS, Triton, cuDNN, etc.
Experience with profiling tools (e.g., nvprof, Nsight) and performance analysis methodologies
Solid problem-solving skills and ability to debug complex performance issues
Excellent communication skills and ability to work effectively in a cross-functional team environment

Skills

PyTorchTensorFlowJAXCUDACutlassTritonCudnnNvprofNsightGpu Kernel Optimization

Similar roles

ML Engineering jobs

Baseten

Post-Training Research Engineer

Build in-house tooling for post-training custom ML models using advanced techniques like RL and finetuning. Requires deep expertise in transformer training, PyTorch distributed systems, parallelism strategies, GPU performance optimization, and HPC platforms.

200k – 275kSan Francisco, CAML EngineeringHybridJAXRay

Glean

Machine Learning Engineer, Enterprise Brain

Develop ML systems for the Enterprise Brain, focusing on proactive AI for task prediction, automation, and agentic workflows using LLMs and advanced techniques. Requires 3+ years ML experience, Python proficiency, and expertise in evaluation and production systems.

200k – 300kPalo Alto, CA +1ML EngineeringHybrid3+ YOELLMsPython

Cantina

Machine Learning Engineer, Images

Designs, fine-tunes, and deploys image generation models for photorealistic AI bots, optimizing for consistency, latency, and quality. Requires 5+ years software engineering, 2+ years production ML, and expertise in diffusion models like Stable Diffusion and PyTorch.

200k – 265kSan Francisco, CAML EngineeringRemote5+ YOEGCPAWS

Together AI

Research Engineer, Core ML

Research Engineer building production ML systems at the intersection of efficient inference, RL/post-training, and serving engines. Translates algorithms into scalable infrastructure improving latency, throughput, and model quality. Requires 3+ years ML systems experience and advanced degree.

200k – 280kSan Francisco, CAML EngineeringOn-site3+ YOEDpovLLM

Cinder

AI Engineer

Builds and deploys production-scale AI/ML systems using LLMs, from fine-tuning and evaluation to low-latency infrastructure. Requires 5+ years experience with PyTorch/TensorFlow, MLOps, AWS, and taking models to production at high-growth startups.

200k – 250kNew York, NYML EngineeringHybrid5+ YOERAGAWS