Sr. Software Engineer, AI

125k – 175kChicago, ILHybrid5+ YOEMay 20

Summary

Forward-deployed AI Engineer embedding with teams to build production agentic workflows, LLM applications, RAG pipelines, and MCP servers that automate work across Engineering, Operations, and Finance. Own end-to-end delivery from discovery through production monitoring on GCP.

About the role

Responsibilities

Design and build multi-step agentic workflows in Python and TypeScript — planning loops, tool dispatch, error recovery, and explicit human-in-the-loop checkpoints for high-stakes decisions
Develop production LLM applications on Anthropic and OpenAI SDKs, including prompt engineering, structured outputs, tool/function calling, prompt caching, and batch processing
Build and maintain RAG pipelines — embedding generation, vector/hybrid search, knowledge base ingestion
Own eval discipline end-to-end: define offline eval sets, run A/B experiments on model changes, build regression suites, and articulate “good enough” exit criteria using LangSmith, Braintrust, or equivalent
Drive cost and latency optimization — token budgets, model tier selection, and caching strategies that hold up at scale
Build MCP servers and function-calling connectors that give agents reliable, schema-governed access to internal tools, APIs, and data sources
Implement and maintain production integrations using REST, GraphQL, webhooks, and event-driven patterns (queues, Pub/Sub) with proper idempotency, retry logic, and backfill support
Wire up OAuth/SAML authentication flows (Okta in particular) for secure agent-to-service access across internal and third-party systems
Own cloud infrastructure for AI workloads on GCP using Terraform, GKE/Cloud Run, and secrets management — with logging, metrics, and alerting from day one
Build data pipelines that feed AI systems: strong SQL, Athena/BigQuery-class warehouses, ETL/ELT, schema design, and data-quality monitoring
Partner with internal teams across Engineering, Operations, Customer Support, Data, and Finance to identify where agentic automation can have the highest leverage — then build it
Create reusable libraries, SDKs, and internal tooling so teams can extend AI capabilities without starting from scratch
Act as a technical advisor and embedded engineer, translating ambiguous business problems into well-scoped AI systems with clear success metrics
Instrument and monitor deployed agents in production — on-call for what you ship, and treat reliability as a feature

Requirements

5+ years of production software engineering experience, primarily in Python or TypeScript
Production LLM application experience with Anthropic or OpenAI SDKs — agents, structured outputs, tool use, RAG, evals, batch processing
Forward-deployed instinct: engineering, developer relations, or solutions engineering experience
Strong evaluation discipline with the ability to define and defend exit criteria using LangSmith, Braintrust, or equivalent tools
Experience building multi-step tool-using agents with planning, error recovery, and human-in-the-loop design in production environments
Experience with RAG pipelines, embeddings, hybrid search, and the judgment to determine when retrieval improves outcomes
Experience building MCP servers, function-calling schemas, and sandboxed execution environments
Strong understanding of token budgets, model tier trade-offs, and AI cost/latency optimization strategies
Experience integrating REST APIs, GraphQL, webhooks, OAuth/SAML authentication (especially Okta), and event-driven architectures
Cloud-native engineering experience with GCP or AWS, including Terraform, containers, secrets management, logging, metrics, and alerting
Strong SQL and data engineering experience with modern warehouses, ETL/ELT pipelines, schema design, and data-quality monitoring
Ability to work cross-functionally and translate ambiguous business problems into production-ready AI systems
Strong communication skills with both technical and non-technical stakeholders

Nice-to-Haves

Trading industry, fintech, or capital markets experience
Futures trading knowledge
Experience with LangChain, LlamaIndex, or similar orchestration frameworks
Familiarity with observability tooling such as OpenTelemetry, Prometheus, and Grafana
Contributions to open-source AI or developer tooling projects

Compensation & Benefits

Salary range: $125,000 - $175,000 USD
Annual target bonus of 12%
401K plan with company match up to 3.5% of employee contributions
18 days paid time off per year plus seven paid holidays

Skills

PythonTypeScriptAnthropic SDKOpenAI SDKRAGLangSmithGCPTerraformGKEGraphQLREST APIsOAuthSAMLOktaSQL

Similar roles at this salary range

All ML Engineering jobs →

Twilio

Jun 16

Senior / Staff Applied Research Software Engineer

Senior or Staff Applied Research Software Engineer building AI/ML prototypes and production solutions. Requires 3-5+ years full-stack experience with modern web frameworks, databases, and strong AI-assisted coding skills.

142k – 252kUnited StatesML EngineeringRemote5+ YOEAISQL

Together AI

Jun 15

Research Intern, Model Shaping

Research intern on the Model Shaping team working on post-training methods, efficient neural network training, and foundation model evaluation. Requires strong ML fundamentals and PyTorch/JAX experience.

121k – 131kSan Francisco, CAML EngineeringOn-siteEntry levelJAXPyTorch

Docker

Jun 15

ML Engineer

Founding ML Engineer building production ML systems for governance, security, and agentic platform capabilities at Docker. Requires 5+ years applied ML experience shipping systems and 4+ years backend/infra engineering.

139k – 226kPalo Alto, CA +1ML EngineeringRemote5+ YOELLMsRetrieval

Together AI

Jun 12

Systems Research Engineer Intern - GPU Programming

Intern developing and optimizing GPU-accelerated kernels for ML/AI applications. Requires strong GPU programming background (CUDA/Triton) and knowledge of performance optimization.

121k – 131kSan Francisco, CAML EngineeringOn-siteEntry levelCUDATriton

Together AI

Jun 12

Research Intern, Inference

Research intern on the Inference team building efficient serving systems for large foundation models. Focus on distributed inference, compiler-aware optimization, and novel inference-time strategies.

121k – 131kSan Francisco, CAML EngineeringOn-siteEntry levelJAXCUDA

Apply