Skip to content

AI Engineer, Model Quality and Performance

Sunnyvale, CAML EngineeringOnsite
Summary

Own model quality and performance for Cerebras inference by building AI agent-driven eval suites, automating benchmarking, and creating customer-specific tooling. Requires strong AI agent experience and tooling intuition.

About the role

What You'll Do

  • Design eval suites with AI agents in the loop. For every model release, curate a thoughtful mix of advanced, basic, long-context, and customer-use-case-specific evals. Use Claude to generate, validate, and prune candidate test cases at speed.
  • Build custom evals for target customers by orchestrating AI agents to mine trajectories from their workloads and synthesize representative eval sets.
  • Automate eval execution end-to-end with AI-driven pipelines on top of standard tooling (Docker, Git, CI). The goal is a system that runs itself between releases, not a script you re-run by hand.
  • Build automations to forecast and benchmark model performance on Cerebras for our top customers, including modeling how fast customer-specific workloads will run in production.
  • Build product-quality tooling that synthesizes quality + performance data into a single, easy-to-use view.

Skills & Qualifications

  • Experience building AI agents. You ship real systems with Claude (or equivalent) as a force multiplier. You've built things that would have been infeasible solo without AI agents in the loop.
  • Strong math/stats background.
  • Comfort with Docker, Git, and the standard automation stack.
  • A taste for tooling design. You've shipped something that a non-engineer used without complaining. Bonus if AI helped you ship it.

Assets

  • Performance-tuning experience on custom silicon, GPUs, or FPGAs.
  • Experience designing evals for agentic / coding / long-context / multimodal use cases.
  • Familiarity with open-source eval frameworks (EvalScope, lm-eval-harness, etc.).
Skills
AI agentsClaudeDockerGitCI/CDmodel evaluationperformance benchmarkingPythonmath/statslm-eval-harness