Responsibilities

Design and ship multi-step agentic systems (planner/executor, tool-using, multi-agent, human-in-the-loop) for onboarding, underwriting, case review, and continuous monitoring.
Architect agent graphs in LangGraph (or comparable — CrewAI, AutoGen, Claude Agent SDK) with explicit state, durable execution, retries, and safe fallbacks.
Build the retrieval layer powering our agents — chunking, hybrid search, reranking, and grounded citation.
Own the eval stack: golden sets, offline regression suites, LLM-as-judge, online A/B and shadow evals, and red-teaming for jailbreaks, prompt injection, and PII leakage.
Expose agents to production systems via well-typed tools and MCP servers. Treat tool surface area as a product.
Drive production MLOps: deployment, versioning, traffic shaping, cost/latency budgets, tracing, and on-call playbooks for agent incidents.
Partner with security and compliance to keep agents inside SOC 2, GDPR, CCPA, and fair-lending posture — auditability and explainability built in.
Mentor engineers on agent patterns, prompt hygiene, eval discipline, and LLM failure modes.

Requirements

5+ years of software engineering experience, with 2+ years building production LLM or agentic systems (not just notebooks or demos).
Hands-on experience with a modern agent framework (LangGraph strongly preferred) and a track record of shipping agents that run, fail gracefully, and recover.
Strong RAG fundamentals: chunking, embeddings, hybrid retrieval, reranking, grounding — and judgment about when RAG isn’t the right answer.
Real eval experience: golden sets, offline and online evaluations, used to make ship/no-ship calls.
Production MLOps fluency: deployed LLM workloads under real latency, cost, and reliability constraints.
Strong Python; comfortable in TypeScript / Node.js.
Solid systems engineering instincts: APIs, async patterns, queues, databases, distributed system failure modes.
Calibrated communicator; thrives in ambiguous, fast-moving environments.
Prior experience in fintech, lending, payments, KYB/KYC, fraud, or AML.

Nice-to-Haves

Experience building MCP servers or other structured tool interfaces for LLMs.
Background in classical ML (ranking, scoring, calibration).
Experience designing explainable / auditable AI workflows for regulated environments.
Open-source contributions to agent frameworks, eval tooling, or retrieval libraries.
AWS depth (EKS, MSK, RDS, S3, Lambda) and IaC with Terraform.

Technology Stack

Languages: Python, Node.js, TypeScript
Agent / LLM frameworks: LangGraph, LangChain, Claude Agent SDK, MCP, OpenAI SDK
Models: Anthropic Claude, OpenAI, open-weight where appropriate
Retrieval & Data: PostgreSQL, pgvector, OpenSearch, Kafka, Redshift, Redis
Infra: AWS, Kubernetes (EKS), ArgoCD, Terraform
Evals & Observability: LangSmith / Langfuse / Braintrust-style tooling, DataDog

Benefits

Health Care Plan (Medical, Dental & Vision)
Retirement Plan (401k, IRA)
Life Insurance
Flexible Paid Time Off
9 paid Holidays
Family Leave