Skip to content

Sr. Software Engineer, AI

125k – 175kChicago, ILHybrid5+ YOE
Summary

Forward-deployed AI Engineer embedding with teams to build production agentic workflows, LLM applications, RAG pipelines, and MCP servers that automate work across Engineering, Operations, and Finance. Own end-to-end delivery from discovery through production monitoring on GCP.

About the role

Responsibilities

  • Design and build multi-step agentic workflows in Python and TypeScript — planning loops, tool dispatch, error recovery, and explicit human-in-the-loop checkpoints for high-stakes decisions
  • Develop production LLM applications on Anthropic and OpenAI SDKs, including prompt engineering, structured outputs, tool/function calling, prompt caching, and batch processing
  • Build and maintain RAG pipelines — embedding generation, vector/hybrid search, knowledge base ingestion
  • Own eval discipline end-to-end: define offline eval sets, run A/B experiments on model changes, build regression suites, and articulate “good enough” exit criteria using LangSmith, Braintrust, or equivalent
  • Drive cost and latency optimization — token budgets, model tier selection, and caching strategies that hold up at scale
  • Build MCP servers and function-calling connectors that give agents reliable, schema-governed access to internal tools, APIs, and data sources
  • Implement and maintain production integrations using REST, GraphQL, webhooks, and event-driven patterns (queues, Pub/Sub) with proper idempotency, retry logic, and backfill support
  • Wire up OAuth/SAML authentication flows (Okta in particular) for secure agent-to-service access across internal and third-party systems
  • Own cloud infrastructure for AI workloads on GCP using Terraform, GKE/Cloud Run, and secrets management — with logging, metrics, and alerting from day one
  • Build data pipelines that feed AI systems: strong SQL, Athena/BigQuery-class warehouses, ETL/ELT, schema design, and data-quality monitoring
  • Partner with internal teams across Engineering, Operations, Customer Support, Data, and Finance to identify where agentic automation can have the highest leverage — then build it
  • Create reusable libraries, SDKs, and internal tooling so teams can extend AI capabilities without starting from scratch
  • Act as a technical advisor and embedded engineer, translating ambiguous business problems into well-scoped AI systems with clear success metrics
  • Instrument and monitor deployed agents in production — on-call for what you ship, and treat reliability as a feature

Requirements

  • 5+ years of production software engineering experience, primarily in Python or TypeScript
  • Production LLM application experience with Anthropic or OpenAI SDKs — agents, structured outputs, tool use, RAG, evals, batch processing
  • Forward-deployed instinct: engineering, developer relations, or solutions engineering experience
  • Strong evaluation discipline with the ability to define and defend exit criteria using LangSmith, Braintrust, or equivalent tools
  • Experience building multi-step tool-using agents with planning, error recovery, and human-in-the-loop design in production environments
  • Experience with RAG pipelines, embeddings, hybrid search, and the judgment to determine when retrieval improves outcomes
  • Experience building MCP servers, function-calling schemas, and sandboxed execution environments
  • Strong understanding of token budgets, model tier trade-offs, and AI cost/latency optimization strategies
  • Experience integrating REST APIs, GraphQL, webhooks, OAuth/SAML authentication (especially Okta), and event-driven architectures
  • Cloud-native engineering experience with GCP or AWS, including Terraform, containers, secrets management, logging, metrics, and alerting
  • Strong SQL and data engineering experience with modern warehouses, ETL/ELT pipelines, schema design, and data-quality monitoring
  • Ability to work cross-functionally and translate ambiguous business problems into production-ready AI systems
  • Strong communication skills with both technical and non-technical stakeholders

Nice-to-Haves

  • Trading industry, fintech, or capital markets experience
  • Futures trading knowledge
  • Experience with LangChain, LlamaIndex, or similar orchestration frameworks
  • Familiarity with observability tooling such as OpenTelemetry, Prometheus, and Grafana
  • Contributions to open-source AI or developer tooling projects

Compensation & Benefits

  • Salary range: $125,000 - $175,000 USD
  • Annual target bonus of 12%
  • 401K plan with company match up to 3.5% of employee contributions
  • 18 days paid time off per year plus seven paid holidays
Skills
PythonTypeScriptAnthropic SDKOpenAI SDKRAGLangSmithGCPTerraformGKEGraphQLREST APIsOAuthSAMLOktaSQL
Similar roles at this salary range
All ML Engineering jobs →
Twilio

Senior / Staff Applied Research Software Engineer

Senior or Staff Applied Research Software Engineer building AI/ML prototypes and production solutions. Requires 3-5+ years full-stack experience with modern web frameworks, databases, and strong AI-assisted coding skills.

142k – 252kUnited StatesML EngineeringRemote5+ YOEAISQL
Together AI

Research Intern, Model Shaping

Research intern on the Model Shaping team working on post-training methods, efficient neural network training, and foundation model evaluation. Requires strong ML fundamentals and PyTorch/JAX experience.

121k – 131kSan Francisco, CAML EngineeringOn-siteEntry levelJAXPyTorch
Docker

ML Engineer

Founding ML Engineer building production ML systems for governance, security, and agentic platform capabilities at Docker. Requires 5+ years applied ML experience shipping systems and 4+ years backend/infra engineering.

139k – 226kPalo Alto, CA +1ML EngineeringRemote5+ YOELLMsRetrieval
Together AI

Systems Research Engineer Intern - GPU Programming

Intern developing and optimizing GPU-accelerated kernels for ML/AI applications. Requires strong GPU programming background (CUDA/Triton) and knowledge of performance optimization.

121k – 131kSan Francisco, CAML EngineeringOn-siteEntry levelCUDATriton
Together AI

Research Intern, Inference

Research intern on the Inference team building efficient serving systems for large foundation models. Focus on distributed inference, compiler-aware optimization, and novel inference-time strategies.

121k – 131kSan Francisco, CAML EngineeringOn-siteEntry levelJAXCUDA