Engineering Manager, Model Inference
Engineering Manager leading the Model Inference team, responsible for architecting and scaling low-latency, high-throughput LLM serving infrastructure and growing a team of AI inference engineers.
What You’ll Do
- Lead and grow a high-performing team of AI inference engineers focused on building and scaling infrastructure for Abridge’s products and APIs
- Own the technical direction of our inference systems—making key decisions around batching, throughput, latency, and GPU utilization
- Architect and scale inference infrastructure for reliability, efficiency, and observability; lead incident response
- Benchmark and eliminate bottlenecks throughout the inference stack
- Partner with ML Research teams on model optimization, quantization, and deployment
- Develop APIs for AI inference used by both internal teams and external customers
- Recruit, mentor, and develop engineering talent; establish team processes, engineering standards, and operational excellence
- Work closely with the GenAI Platform, Data, and Product teams to plan and execute projects that directly impact clinicians and patients
What You’ll Bring
- 5+ years of engineering experience with 1+ years in a technical leadership or management role
- Deep, hands-on experience with ML systems and inference frameworks (e.g., PyTorch, TensorRT, vLLM, TensorFlow)
- Strong understanding of LLM architecture (e.g. Multi-Head Attention, Multi/Grouped-Query Attention, and common transformer components)
- Experience with inference optimizations (e.g. batching, quantization, kernel fusion, FlashAttention)
- Familiarity with GPU characteristics, roofline models, and performance analysis
- Experience deploying reliable, distributed, real-time systems at scale
- Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
- Skilled at hiring and mentorship, with a demonstrated track record of helping engineers grow their skills and careers
- Strong technical communication and cross-functional collaboration skills
- Comfortable giving constructive feedback on technical designs and code reviews
- Has thrived in a fast-growing startup and knows how to operate with urgency and focus
Added Bonus
- Background in training infrastructure and RL workloads
- Skilled in building secure, compliant systems on major cloud platforms (GCP preferred, AWS experience welcome)
- Experience with Kubernetes and container orchestration at scale
- Published work or contributions to inference optimization research
Senior Director, Engineering - Agentic Business Systems
Lead internal AI platform and agentic workflow deployment across business functions. Own infrastructure, ship high-impact automations, and manage a mixed engineering/product/business team reporting to the CEO.
Senior Staff Software Engineer, Managed Platform Services
Senior technical leader anchoring distributed systems depth across Crusoe Cloud's Managed Platform Services. Owns performance engineering, operational excellence, and long-term architecture for 10x scale across all platform domains.
Verification and Validation Manager - Autonomy Trucking
Lead and grow a team defining validation strategies for L4 autonomy programs in trucking. Set technical direction, mentor engineers, and enable safe fleet deployment across the US and Japan.
Manager, Software Engineering, Search Discovery
Lead and grow a senior engineering team building dashboards, notebooks, visualizations, and AI-assisted investigation tools for Cribl Search. Partner with Product on roadmaps and mentor staff+ engineers in a fast-paced environment.
Engineering Manager
Engineering Manager for a Core Product team building AI-powered scheduling, payments, and client management systems. Owns execution, team health, and AI tooling adoption while partnering closely with Product and Design.