Member of Technical Staff, Evals & Post-Training Product
San Mateo, CAFullstack EngineeringOnsite
Summary
Develops evaluation workflows and fine-tuning products for LLM model improvement, working across full-stack from APIs to web interfaces. Requires 1-7 years software engineering with hands-on LLM evals/post-training experience and user-centric product skills.
About the role
Key Responsibilities
- Build internal eval workflows: Design and scale evaluation tooling used by internal teams to measure model quality, compare model changes, and inform post-training decisions.
- Own fine-tuning product experiences: Build and improve user-facing product workflows for post-training, including fine-tuning experiences across SFT, RFT, and related model-improvement capabilities.
- Work closely with users: Partner with customers and internal stakeholders to understand evaluation and fine-tuning needs, support high-priority engagements, triage issues, and convert bespoke workflows into productized solutions.
Minimum Requirements
- 1 - 7 years of software engineering experience (We are hiring at multiple levels for this role).
- Hands-on experience with LLM evaluations and/or post-training methods: How to design useful evals and use their results to guide model improvement.
- Product Engineering Skills: The ability to work across backend systems and developer-facing product surfaces. Comfortable shipping full-stack features when needed.
- Understanding of the GenAI Lifecycle: You understand the end-to-end workflow—from prompting a base model to curating a dataset, fine-tuning, and productionizing agents—and how these steps interconnect.
- User-Centric Mindset: Willing to talk to users, triage GitHub issues for open-source projects, and build products from scratch to serve emerging needs.
Preferred Qualifications
- 3+ years of software engineering experience.
- Domain-Specific Evaluation Experience: Strong familiarity with designing and running evaluations for domain-specific use cases (e.g. medical, legal, coding, or custom internal datasets).
- Open Source Contributions: Prior contributions to developer tools or AI/ML repositories.
- Inference & Hardware Knowledge: Interest in the hardware side of AI—understanding GPU constraints, inference optimization techniques, and how they relate to model performance.
- Startup DNA: Experience in fast-paced environments where you own features end-to-end.
Skills
LLM EvaluationsPost-TrainingFine-TuningSFTRFTAPIsSDKsFull-StackGenAIGPUInference Optimization