Senior Agentic (AI) Engineer
Orlando, FLML EngineeringRemote5+ YOE
Summary
Designs and deploys production agentic AI systems using LangGraph for automating KYB, underwriting, and risk decisions in fintech. Requires 5+ years engineering with 2+ years in production LLMs/agents, strong RAG/evals/MLOps, Python/TypeScript.
About the role
Responsibilities
- Design and ship multi-step agentic systems (planner/executor, tool-using, multi-agent, human-in-the-loop) for onboarding, underwriting, case review, and continuous monitoring.
- Architect agent graphs in LangGraph (or comparable — CrewAI, AutoGen, Claude Agent SDK) with explicit state, durable execution, retries, and safe fallbacks.
- Build the retrieval layer powering our agents — chunking, hybrid search, reranking, and grounded citation.
- Own the eval stack: golden sets, offline regression suites, LLM-as-judge, online A/B and shadow evals, and red-teaming for jailbreaks, prompt injection, and PII leakage.
- Expose agents to production systems via well-typed tools and MCP servers. Treat tool surface area as a product.
- Drive production MLOps: deployment, versioning, traffic shaping, cost/latency budgets, tracing, and on-call playbooks for agent incidents.
- Partner with security and compliance to keep agents inside SOC 2, GDPR, CCPA, and fair-lending posture — auditability and explainability built in.
- Mentor engineers on agent patterns, prompt hygiene, eval discipline, and LLM failure modes.
Requirements
- 5+ years of software engineering experience, with 2+ years building production LLM or agentic systems (not just notebooks or demos).
- Hands-on experience with a modern agent framework (LangGraph strongly preferred) and a track record of shipping agents that run, fail gracefully, and recover.
- Strong RAG fundamentals: chunking, embeddings, hybrid retrieval, reranking, grounding — and judgment about when RAG isn’t the right answer.
- Real eval experience: golden sets, offline and online evaluations, used to make ship/no-ship calls.
- Production MLOps fluency: deployed LLM workloads under real latency, cost, and reliability constraints.
- Strong Python; comfortable in TypeScript / Node.js.
- Solid systems engineering instincts: APIs, async patterns, queues, databases, distributed system failure modes.
- Calibrated communicator; thrives in ambiguous, fast-moving environments.
- Prior experience in fintech, lending, payments, KYB/KYC, fraud, or AML.
Nice-to-Haves
- Experience building MCP servers or other structured tool interfaces for LLMs.
- Background in classical ML (ranking, scoring, calibration).
- Experience designing explainable / auditable AI workflows for regulated environments.
- Open-source contributions to agent frameworks, eval tooling, or retrieval libraries.
- AWS depth (EKS, MSK, RDS, S3, Lambda) and IaC with Terraform.
Technology Stack
Languages: Python, Node.js, TypeScript
Agent / LLM frameworks: LangGraph, LangChain, Claude Agent SDK, MCP, OpenAI SDK
Models: Anthropic Claude, OpenAI, open-weight where appropriate
Retrieval & Data: PostgreSQL, pgvector, OpenSearch, Kafka, Redshift, Redis
Infra: AWS, Kubernetes (EKS), ArgoCD, Terraform
Evals & Observability: LangSmith / Langfuse / Braintrust-style tooling, DataDog
Benefits
- Health Care Plan (Medical, Dental & Vision)
- Retirement Plan (401k, IRA)
- Life Insurance
- Flexible Paid Time Off
- 9 paid Holidays
- Family Leave
Skills
LangGraphLangChainPythonTypeScriptNode.jsRAGMLOpsKubernetesAWSPostgreSQLpgvectorOpenSearchKafkaRedisTerraform