Key Responsibilities

Build internal eval workflows: Design and scale evaluation tooling used by internal teams to measure model quality, compare model changes, and inform post-training decisions.
Own fine-tuning product experiences: Build and improve user-facing product workflows for post-training, including fine-tuning experiences across SFT, RFT, and related model-improvement capabilities.
Work closely with users: Partner with customers and internal stakeholders to understand evaluation and fine-tuning needs, support high-priority engagements, triage issues, and convert bespoke workflows into productized solutions.

Minimum Requirements

1 - 7 years of software engineering experience (We are hiring at multiple levels for this role).
Hands-on experience with LLM evaluations and/or post-training methods: How to design useful evals and use their results to guide model improvement.
Product Engineering Skills: The ability to work across backend systems and developer-facing product surfaces. Comfortable shipping full-stack features when needed.
Understanding of the GenAI Lifecycle: You understand the end-to-end workflow—from prompting a base model to curating a dataset, fine-tuning, and productionizing agents—and how these steps interconnect.
User-Centric Mindset: Willing to talk to users, triage GitHub issues for open-source projects, and build products from scratch to serve emerging needs.

3+ years of software engineering experience.
Domain-Specific Evaluation Experience: Strong familiarity with designing and running evaluations for domain-specific use cases (e.g. medical, legal, coding, or custom internal datasets).
Open Source Contributions: Prior contributions to developer tools or AI/ML repositories.
Inference & Hardware Knowledge: Interest in the hardware side of AI—understanding GPU constraints, inference optimization techniques, and how they relate to model performance.
Startup DNA: Experience in fast-paced environments where you own features end-to-end.