Sr. Software Engineer, AI
Forward-deployed AI Engineer embedding with teams to build production agentic workflows, LLM applications, RAG pipelines, and MCP servers that automate work across Engineering, Operations, and Finance. Own end-to-end delivery from discovery through production monitoring on GCP.
Responsibilities
- Design and build multi-step agentic workflows in Python and TypeScript — planning loops, tool dispatch, error recovery, and explicit human-in-the-loop checkpoints for high-stakes decisions
- Develop production LLM applications on Anthropic and OpenAI SDKs, including prompt engineering, structured outputs, tool/function calling, prompt caching, and batch processing
- Build and maintain RAG pipelines — embedding generation, vector/hybrid search, knowledge base ingestion
- Own eval discipline end-to-end: define offline eval sets, run A/B experiments on model changes, build regression suites, and articulate “good enough” exit criteria using LangSmith, Braintrust, or equivalent
- Drive cost and latency optimization — token budgets, model tier selection, and caching strategies that hold up at scale
- Build MCP servers and function-calling connectors that give agents reliable, schema-governed access to internal tools, APIs, and data sources
- Implement and maintain production integrations using REST, GraphQL, webhooks, and event-driven patterns (queues, Pub/Sub) with proper idempotency, retry logic, and backfill support
- Wire up OAuth/SAML authentication flows (Okta in particular) for secure agent-to-service access across internal and third-party systems
- Own cloud infrastructure for AI workloads on GCP using Terraform, GKE/Cloud Run, and secrets management — with logging, metrics, and alerting from day one
- Build data pipelines that feed AI systems: strong SQL, Athena/BigQuery-class warehouses, ETL/ELT, schema design, and data-quality monitoring
- Partner with internal teams across Engineering, Operations, Customer Support, Data, and Finance to identify where agentic automation can have the highest leverage — then build it
- Create reusable libraries, SDKs, and internal tooling so teams can extend AI capabilities without starting from scratch
- Act as a technical advisor and embedded engineer, translating ambiguous business problems into well-scoped AI systems with clear success metrics
- Instrument and monitor deployed agents in production — on-call for what you ship, and treat reliability as a feature
Requirements
- 5+ years of production software engineering experience, primarily in Python or TypeScript
- Production LLM application experience with Anthropic or OpenAI SDKs — agents, structured outputs, tool use, RAG, evals, batch processing
- Forward-deployed instinct: engineering, developer relations, or solutions engineering experience
- Strong evaluation discipline with the ability to define and defend exit criteria using LangSmith, Braintrust, or equivalent tools
- Experience building multi-step tool-using agents with planning, error recovery, and human-in-the-loop design in production environments
- Experience with RAG pipelines, embeddings, hybrid search, and the judgment to determine when retrieval improves outcomes
- Experience building MCP servers, function-calling schemas, and sandboxed execution environments
- Strong understanding of token budgets, model tier trade-offs, and AI cost/latency optimization strategies
- Experience integrating REST APIs, GraphQL, webhooks, OAuth/SAML authentication (especially Okta), and event-driven architectures
- Cloud-native engineering experience with GCP or AWS, including Terraform, containers, secrets management, logging, metrics, and alerting
- Strong SQL and data engineering experience with modern warehouses, ETL/ELT pipelines, schema design, and data-quality monitoring
- Ability to work cross-functionally and translate ambiguous business problems into production-ready AI systems
- Strong communication skills with both technical and non-technical stakeholders
Nice-to-Haves
- Trading industry, fintech, or capital markets experience
- Futures trading knowledge
- Experience with LangChain, LlamaIndex, or similar orchestration frameworks
- Familiarity with observability tooling such as OpenTelemetry, Prometheus, and Grafana
- Contributions to open-source AI or developer tooling projects
Compensation & Benefits
- Salary range: $125,000 - $175,000 USD
- Annual target bonus of 12%
- 401K plan with company match up to 3.5% of employee contributions
- 18 days paid time off per year plus seven paid holidays
Senior / Staff Applied Research Software Engineer
Senior or Staff Applied Research Software Engineer building AI/ML prototypes and production solutions. Requires 3-5+ years full-stack experience with modern web frameworks, databases, and strong AI-assisted coding skills.
Research Intern, Model Shaping
Research intern on the Model Shaping team working on post-training methods, efficient neural network training, and foundation model evaluation. Requires strong ML fundamentals and PyTorch/JAX experience.
ML Engineer
Founding ML Engineer building production ML systems for governance, security, and agentic platform capabilities at Docker. Requires 5+ years applied ML experience shipping systems and 4+ years backend/infra engineering.
Systems Research Engineer Intern - GPU Programming
Intern developing and optimizing GPU-accelerated kernels for ML/AI applications. Requires strong GPU programming background (CUDA/Triton) and knowledge of performance optimization.