What You'll Achieve
- Context engineering — Design, test, iterate on system prompts, tool prompts, context strategies.
- Understand & debug — Analyze production data, transcripts, logs, user feedback; reproduce issues, find root causes.
- Build evals & Measurement — Design eval strategies, build datasets, track quality, own improvement loops.
- Evaluate and launch new models — Benchmark models from OpenAI, Anthropic, Google on quality, latency, cost, edge cases.
- Drive quality priorities — Surface issues for eng/product teams, own quality narrative.
- Build tooling & systems — Manage AI observability (e.g., Braintrust), build playbooks/tools.
Skills You’ll Need
- Driver mentality, bias to action.
- Curiosity about LLM capabilities.
- Analytical instinct, find signal in noise.
- Comfortable with data (SQL, coding agents).
- Clear communication.
- Experience with LLMs, prompting, or AI products.
Nice to Haves
- Backgrounds in engineering, product, data science, research, consulting.
- Built personal projects/side projects/startups.
Compensation
For New York City: $98,000 - $140,000 base salary per year, plus equity and benefits.