Staff Applied AI Engineer
United StatesML EngineeringRemote
Summary
Leads architecture of AI agents that convert natural language to production full-stack apps, driving multi-provider LLM strategies, tool integration, evaluation standards, and cross-team AI initiatives. Requires deep LLM expertise, prompt engineering, and scalable systems design.
About the role
How You'll Contribute
- Define AI Agent Architecture: Lead the design and evolution of our AI agent systems, establishing patterns, frameworks, and standards that teams across the organization adopt. Own the technical vision for how agents manage context, orchestrate workflows, and scale to handle increasingly complex user needs.
- Drive Multi-Provider Strategy: Shape our approach to leveraging models from providers such as OpenAI (GPT series), Anthropic (Claude), and Google (Gemini). Establish evaluation frameworks and selection criteria that teams use to choose the right model for a given task. Build relationships with provider teams to influence roadmaps and beta-test new capabilities.
- Architect Tool Use and Workflow Systems: Design the foundational systems that enable AI agents to call external tools and APIs safely and effectively. Define the abstractions and interfaces that allow the agent to perform actions like web searches, database queries, and domain-specific operations. Evaluate and recommend frameworks such as Vercel's AI SDK, LangGraph, and others, establishing best practices for the organization.
- Cross-Team Leadership: Partner with teams across engineering, product, and design to align AI initiatives with business objectives. Influence roadmaps, resolve technical disagreements, and ensure AI-driven features are architected for long-term maintainability and performance. Mentor senior and mid-level engineers, raising the bar for AI engineering practices across the organization.
- Establish Data and Evaluation Standards: Define the methodology for collecting, curating, and analyzing datasets from agent responses and multi-turn conversations. Build and steward the evaluation harness, ensuring evals directly support business objectives and KPIs. Turn insights from conversation patterns, failure modes, and success signals into systematic improvements.
- Drive Research and Innovation: Stay at the forefront of NLP and LLM research, identifying and championing novel techniques that provide competitive advantage. Lead experimentation with new prompting strategies, context handling methods, and fine-tuning opportunities. Represent StackBlitz in external forums, conferences, and community discussions where appropriate.
Qualifications
- TypeScript: Familiarity with TypeScript is important. Our entire stack is built on it. Willingness to work in TS daily is key.
- Deep LLM Experience: Extensive hands-on experience working with Large Language Models (LLMs), with a nuanced understanding of their capabilities, limitations, and emergent behaviors. Proven track record of building and scaling production AI systems.
- Prompt Engineering: Deep expertise in prompt engineering with the ability to establish best practices and mentor others. Skilled at crafting, refining, and optimizing prompts across different tasks, models, and use cases.
- Software Engineering Excellence: Strong software engineering fundamentals with experience designing systems that scale. Able to make architectural decisions that balance immediate needs with long-term maintainability.
- Strategic Execution: Ability to take ambiguous, high-scope problems and drive them to completion with minimal oversight. Comfortable influencing direction across teams and navigating complex technical and organizational challenges.
- Systems Thinking: Ability to identify process, communication, and technical debt across the organization and propose solutions that accelerate velocity for multiple teams.
- Data-Driven Leadership: Experienced in establishing data collection and analysis practices. Able to build evaluation frameworks, identify patterns in agent behavior, and translate findings into organizational improvements.
- Strong verbal and written English communication skills are required.
Bonus Points
- DSPy Framework: Familiarity with DSPy (Declarative Self-improving Python) for building modular AI systems and optimizing prompts programmatically.
- Machine Learning Background: Understanding of ML fundamentals and experience with model evaluation metrics.
- Open Source Contributions: Experience contributing to or maintaining open-source AI/ML projects.
- Research Background: Experience reading and implementing techniques from AI/ML research papers.
- Experience speaking at conferences, publishing technical content, or representing an organization in industry forums.
Skills
TypeScriptLLMsPrompt EngineeringOpenAIAnthropicGoogle GeminiVercel AI SDKLangGraphDSPyNLP