Software Engineer
Builds and debugs AI agent infrastructure for healthcare automation, including prompt engineering, runtime issue tracing, evaluation datasets, simulation tooling, data pipelines, and observability dashboards. Requires 2-7 years experience with production LLMs/AI agents and TypeScript proficiency.
What You’ll Do
- Work across our AI agent platform — writing prompts, debugging runtime issues, building agent simulation tooling, creating evals, interfacing with client data, and helping monitor system behavior at scale.
- Trace and fix runtime bugs, then write regression tests.
- Design evaluation datasets to simulate realistic workflows or red-team our system.
- Build internal tooling for QA and agent simulation.
- Normalize and transform messy client data for system integration.
- Set up automatic testing and latency tracking infrastructure.
- Create dashboards and observability tooling for agentic system behavior.
- Expand on our existing eval & testing framework and agent simulation infrastructure.
Skills Required
Technical Skills
- Proficiency in TypeScript
- Strong generalist software engineering
- Strong debugging skills (trace runtime failures, dig through logs, pinpoint issues in async or multi-step agent systems)
- Data transformation and ingestion (build pipelines to normalize and convert unstructured data for AI systems)
- Strong understanding of system design, including distributed systems and reliability/performance tradeoffs
- Experience using modern AI coding tools (e.g. Cursor, GitHub Copilot, Claude)
- Excellent documentation and testing discipline
- Proficiency with Git
Soft Skills
- Care about improving agent behavior
- High agency; thrive with minimal structure
- Comfortable getting in the weeds with details, edge cases, editing prompts, writing evals
- Comfortable with ambiguity; work well with loose specs spanning prompts, code, RLHF
- Learn fast and move fast; pattern-match from past systems work to LLM edge cases
Experience & Who Should Apply
- 2-7 years of experience working closely with LLMs or AI agents in production systems
- Created internal tools or frameworks for QA, evals, or agent simulation
- Contributed to fast-paced product cycles involving AI behavior, latency, user experience
Nice to Have
- Experience with multi-agent systems, TTS/NLP pipelines, or structured output validation
- Familiarity with testing frameworks, LangChain-style agent orchestration, or in-house eval harnesses
- Experience with prompt engineering, LLM evals, and agent orchestration
Senior Machine Learning Operations Engineer
Build and operate Mercury's real-time ML inference platform for fraud risk decisioning. Own model deployment, observability, and lifecycle tooling with strong backend Python fundamentals.
AI Engineer, Evaluation
Design and implement evaluation frameworks and pipelines for AI systems using Evaluation-Driven Development. Build Python-based test suites, LLM graders, and measurement systems that guide prompt iteration and production deployment decisions.
Senior AI Engineer
Senior Engineer building multi-agent AI systems, LLM integrations, and backend automation services that power Marketing Operations. Owns technical direction for agentic infrastructure connecting models to business systems.
Software Engineer, ML Infrastructure
Build and scale ML infrastructure platform for autonomous vehicle development, focusing on automated resource provisioning, high-performance workload scheduling, and petabyte-scale data processing pipelines.