Skip to content

Member of Technical Staff, Evals & Post-Training Product

San Mateo, CAFullstack EngineeringOnsite
Summary

Develops evaluation workflows and fine-tuning products for LLM model improvement, working across full-stack from APIs to web interfaces. Requires 1-7 years software engineering with hands-on LLM evals/post-training experience and user-centric product skills.

About the role

Key Responsibilities

  • Build internal eval workflows: Design and scale evaluation tooling used by internal teams to measure model quality, compare model changes, and inform post-training decisions.
  • Own fine-tuning product experiences: Build and improve user-facing product workflows for post-training, including fine-tuning experiences across SFT, RFT, and related model-improvement capabilities.
  • Work closely with users: Partner with customers and internal stakeholders to understand evaluation and fine-tuning needs, support high-priority engagements, triage issues, and convert bespoke workflows into productized solutions.

Minimum Requirements

  • 1 - 7 years of software engineering experience (We are hiring at multiple levels for this role).
  • Hands-on experience with LLM evaluations and/or post-training methods: How to design useful evals and use their results to guide model improvement.
  • Product Engineering Skills: The ability to work across backend systems and developer-facing product surfaces. Comfortable shipping full-stack features when needed.
  • Understanding of the GenAI Lifecycle: You understand the end-to-end workflow—from prompting a base model to curating a dataset, fine-tuning, and productionizing agents—and how these steps interconnect.
  • User-Centric Mindset: Willing to talk to users, triage GitHub issues for open-source projects, and build products from scratch to serve emerging needs.

Preferred Qualifications

  • 3+ years of software engineering experience.
  • Domain-Specific Evaluation Experience: Strong familiarity with designing and running evaluations for domain-specific use cases (e.g. medical, legal, coding, or custom internal datasets).
  • Open Source Contributions: Prior contributions to developer tools or AI/ML repositories.
  • Inference & Hardware Knowledge: Interest in the hardware side of AI—understanding GPU constraints, inference optimization techniques, and how they relate to model performance.
  • Startup DNA: Experience in fast-paced environments where you own features end-to-end.
Skills
LLM EvaluationsPost-TrainingFine-TuningSFTRFTAPIsSDKsFull-StackGenAIGPUInference Optimization