Senior AI Engineer
Build and deploy production-grade agentic AI systems and automation workflows that drive efficiency across sales, marketing, finance, and other business functions. Partner with stakeholders to identify high-impact use cases and deliver reliable, observable LLM-powered solutions.
What You'll Do
Prioritization of High-Impact Automation
- Partner with business teams (e.g., Revenue Ops, Marketing, Finance Ops, Talent Acquisition) to catalog manual, high-frequency workflows and rank them by impact, feasibility, and urgency.
Production Agent Design & Development
- Build multi-step, tool-augmented agent workflows that can plan, execute, observe outcomes, and iteratively improve.
- Design and implement planner–executor and reflection-based architectures to enhance reasoning quality and task reliability.
- Develop stateful agent systems and incorporate human-in-the-loop controls, including approval gates, fallback paths, and escalation mechanisms.
Platform Integration
- Leverage the Agentic AI Foundations team's platform covering agent runtime, orchestration, memory, tool registry, guardrails, and observability.
- Provide feedback that shapes platform priorities.
Innovation & Standards
- Leverage LLMs, multi-agent frameworks, and orchestration platforms to create differentiated internal solutions.
- Stay ahead of emerging AI technologies and regulatory frameworks.
Technical Patterns
- Discover reusable agent design patterns while building real workflows and contribute them back to the Agentic AI Foundations paved paths.
Operational Excellence
- Leverage the evaluation harness and tracing substrate for continuous performance assessment, failure-mode analysis, and optimization of speed and accuracy.
What We're Looking For
Required
- 8+ years of software engineering experience, with at least 2 years focused on AI/ML systems or LLM-powered applications in production.
- Deep hands-on experience with LLM APIs and at least one agentic framework (e.g. LangGraph, CrewAI or AutoGen).
- Strong Python skills and experience building production-grade backend services, APIs, and data pipelines.
- Proven ability to operate in ambiguity: walked into an undefined problem space, figured out what to build, built it, and measured the results.
- Experience shipping AI/automation solutions that directly impacted business operations.
- Strong systems thinking: design for reliability, observability, and maintainability from day one.
- Ability to collaborate directly with non-technical business stakeholders to understand their workflows and translate those into technical solutions.
Preferred
- Experience with multi-agent systems, workflow orchestration, or complex tool-use patterns in production.
- Familiarity with evaluation frameworks (e.g., LangSmith, Weights & Biases, Arize, or custom eval pipelines).
- Experience with RAG pipelines, vector databases, knowledge graphs, or memory/grounding systems.
- Experience with real-time or streaming AI systems.
- Familiarity with AI safety and security practices, including prompt injection prevention, hallucinations mitigation and data protection/privacy.
- Background in a regulated industry (fintech, healthcare, government).
- Experience with agent skill abstraction and structured tool integration via MCP (Model Context Protocol), function calling or similar protocols.
- Experience with AWS-hosted LLM infrastructure (Bedrock, AgentCore/Strands, Lambda, SageMaker).
Staff Machine Learning Engineer
Staff ML Engineer leading end-to-end identity verification ML systems including document authenticity, face matching, liveness detection, GNN-based identity graphs, and behavioral risk models. Requires 8+ years production ML experience and domain expertise in biometrics or fraud detection.
Staff ML Engineer
Founding Staff ML Engineer building production ML systems for governance, security, and agentic platform capabilities at Docker. Owns architecture, data pipelines, evaluation, and model lifecycle while mentoring the growing team.
Senior Research Engineer, Post-training & Evaluation
Own evaluation science and post-training methodology for Reddit's foundational LLMs. Define benchmarks, design model-as-a-judge systems, and set SFT recipes that turn base models into safe, Reddit-native endpoints.