Research Engineer Intern, Evaluations
Designs evaluation frameworks and benchmarks to test AI agents' autonomy, reasoning, and reliability in data pipelines and warehouses. Requires experience in LLM benchmarking, reinforcement learning, Python, PyTorch/JAX, and data engineering tools.