Skip to content

Applied AI Engineer, Codex Core Agent

230k – 325kSan Francisco, CASeattle, WANew York, NYOnsite
Summary

Develops and improves Codex AI agents for real-world software engineering tasks, focusing on performance, reliability, and integration with research and product teams. Requires strong Python, ML/LLM experience, and skills in evaluation, prompting, and debugging production failures.

About the role

What You’ll Do

  • Design and iterate on agent behaviors across real-world coding tasks and long-horizon workflows.
  • Work closely with research to develop and run evals to measure agent performance, regressions, failure modes, and edge cases.
  • Improve performance through prompting, tool-use strategies, context construction, and model-facing experimentation.
  • Analyze failures in production and systematically improve robustness and reliability.
  • Build feedback loops and data systems that get better real-task data into evaluation and research.
  • Work with product teams to shape user-facing agent experiences and the interfaces the agent depends on.
  • Help define what “good” looks like for agents completing complex tasks end-to-end.

You Might Be a Good Fit If You

  • Have experience building or shipping machine learning or LLM-powered products.
  • Are strong in Python and comfortable with modern ML tooling.
  • Have worked on model evaluation, fine-tuning, or prompt design.
  • Think in terms of systems and user outcomes, not just model metrics.
  • Enjoy debugging messy, real-world failures and turning them into improvements.
  • Want to work in the layer that turns research and model potential into systems that actually work for users.

Bonus Points

  • Experience with agent frameworks or tool-using LLM systems.
  • Research experience with code generation models or developer tooling.
  • Experience working with large, messy datasets or production logs.
Skills
PythonMachine LearningLLMsPrompt EngineeringModel EvaluationFine-tuningAgent FrameworksTool-use LLMsCode GenerationData Systems
Similar roles at this salary range
All ML Engineering jobs →
Databricks

Staff Software Engineer, AI Runtime

Staff Software Engineer building and scaling Databricks' managed large-scale GPU training platform (AIR). Focus on distributed training performance, scheduling, fault tolerance, and developer experience for thousands of accelerators.

190k – 265kMountain View, CA +1ML EngineeringOn-siteFSDPRoCE
Airbnb

Senior Staff Machine Learning Engineer, Communication & Connectivity

Lead ML architecture and implementation for Airbnb's Messaging & Notifications, building recommendation engines, ranking systems, and LLM-powered experiences while mentoring engineers.

244k – 305kUnited StatesML EngineeringRemotePythonAI Systems
Traba

Staff Software Engineer

Founding Staff Applied Agent Engineer to architect and lead Traba's agentic platform, building production LLM/agent systems that integrate with customer WMS/TMS/ERP and drive industrial operations. Requires 7+ years engineering experience with 2+ years building production agent systems.

240k – 300kNew York, NY +1ML EngineeringOn-siteLLMKafka
Traba

Senior Software Engineer

Founding Senior Applied Agent Engineer building production LLM agent systems that automate supply chain workflows. Requires 5+ years engineering experience with 1+ year shipping LLM/agent features, strong Python/TypeScript skills, and hands-on agent stack experience.

200k – 240kNew York, NY +1ML EngineeringOn-sitePythonNode.js
Cribl

Staff Software Engineer, Cribl AI

Staff-level AI/ML engineer building and productionizing generative AI features across backend and frontend for Cribl's observability platform. Requires 6+ years experience, AI/ML and MLOps background, and TypeScript/JavaScript proficiency.

225k – 265kUnited StatesML EngineeringRemoteLLMsReact