How You'll Contribute

Develop AI Agents: Design and implement AI agent features and extend existing agents with new capabilities. This includes managing the agent’s context (using techniques like sub-agents, retrieval based context management, sliding context windows, etc.) so it can handle long conversations or large knowledge and code bases efficiently.
Integrate Multiple LLM Providers: Leverage models from providers such as OpenAI (GPT series), Anthropic (Claude), and Google (Gemini). Quantitatively evaluate and choose the best model for a given task, and incorporate new model features or improvements (often by beta-testing new releases and assessing their strengths).
Tool Use and Workflow Orchestration: Enable the AI agent to call external tools and APIs safely and effectively. Implement structured approaches to allow the agent to perform actions like web searches, database queries, fetch additional information, or other domain-specific operations. Utilize frameworks such as Vercel’s AI SDK, LangGraph and others for building multi-step AI workflows.
Team Collaboration: Work closely with your immediate team and adjacent teams to deliver AI-powered features. Collaborate effectively with peer engineers and product managers to ensure AI-driven features are production-ready, efficient, maintainable, and well-monitored in deployment. Share knowledge and help lift up mid-level and junior engineers on the team.
Data Collection and Analysis: Collect and curate datasets from agent responses and multi-turn conversations to understand agent behavior. Analyze conversation patterns, failure modes, and success signals to derive actionable insights that drive improvements to agent performance and user experience.
Continuous Improvement and Evaluation: Stay up-to-date with the latest research in NLP and LLMs, and experiment with novel techniques (e.g. new prompting strategies, context handling methods, model fine-tuning opportunities). Continuously evaluate the AI system’s performance using systematic tests and user feedback, and iterate on prompts, agents and workflows to improve output quality and reliability (for example, by developing automated LLM evaluation benchmarks).

Qualifications

TypeScript: Familiarity with TypeScript is important. Our entire stack is built on it. Willingness to work in TS daily is key.
LLM Experience: Hands-on experience working with Large Language Models (LLMs) and understanding their capabilities and limitations. Proven experience building applications or systems powered by LLMs.
Prompt Engineering: Deep understanding of prompt engineering best practices to guide LLM behavior. Able to craft, refine, and optimize prompts for different tasks and models.
Software Engineering Skills: Solid software engineering fundamentals with experience in building production-ready systems.
Autonomous Execution: Ability to deliver high-quality work with minimal oversight. Comfortable owning tasks end-to-end, managing complexity, and driving projects to completion within your functional area.
Problem-Solving: Strong analytical and problem-solving skills with the ability to debug complex AI behaviors and bring clarity to ambiguous tasks.
Data-Driven Mindset: Comfortable collecting, curating, and analyzing data to inform decisions. Able to build datasets, identify patterns in agent behavior, and translate findings into actionable improvements.
Strong verbal and written English communication skills are required.

Bonus Points

DSPy Framework: Familiarity with DSPy (Declarative Self-improving Python) for building modular AI systems and optimizing prompts programmatically.
Machine Learning Background: Understanding of ML fundamentals and experience with model evaluation metrics.
Open Source Contributions: Experience contributing to or maintaining open-source AI/ML projects.
Research Background: Experience reading and implementing techniques from AI/ML research papers.