Skip to content

AI Research Engineer – Datadog AI Research (DAIR)

Builds ML infrastructure and tooling to productionize AI research in observability models, SRE agents, and code repair. Requires strong Python/ML systems expertise, distributed computing experience, and proficiency in PyTorch/JAX.

140k – 400kNew York, NYAI ResearchOnsite

About the role

What You’ll Do:

  • Build and operate datasets, training and evaluation pipelines, benchmarks, and internal tooling
  • Implement models, run experiments at scale, and profile for reliability, performance, and cost
  • Orchestrate distributed training and distributed RL with Ray, including scheduling, scaling, and failure recovery
  • Make the research stack observable, reproducible, and easier to use
  • Establish rigorous automated benchmarks and regression tests for forecasting, anomaly detection, multi-modal analysis, agents, and code repair tasks
  • Collaborate with Research Scientists, Product, and Engineering to integrate advanced AI capabilities into Datadog’s product ecosystem
  • Contribute high-quality code, documentation, and open-source artifacts

Who You Are:

  • Strong software engineering skills with experience in observability, SRE, or security
  • Depth in distributed computing and ML systems; experience with Ray, Slurm, or similar
  • Proficient in Python, familiar with systems languages (Rust, C++, Go), comfortable with cloud and data infrastructure
  • Practical experience with PyTorch or JAX, containerization, orchestration, and GPU acceleration
  • Familiar with efficient training, fine-tuning, and inference for large foundation models

Bonus Points:

  • Experience bridging research prototypes to products with foundation models or AI agents
  • Hands-on with GPU programming (CUDA)
  • Experience with production data pipelines

Benefits and Growth:

  • Competitive global benefits
  • New hire stock equity (RSUs) and ESPP
  • Opportunities to collaborate across NYC and Paris offices

Skills

PythonPyTorchJAXRayRustC++GoKubernetesCUDASlurm

Similar roles

AI Research jobs

Machine Learning Researcher, Multimodal LLMs

Develops next-generation multimodal LLMs integrating speech, text, tools, and real-time reasoning for conversational AI agents. Requires strong background in LLMs, multimodal models, fast experimentation, and production deployment experience.

140k – 250kSan Francisco, CAAI ResearchRemoteLLMsPrompting

Copy of Machine Learning Researcher, Audio

Conducts foundational research and develops scalable ML models for speech-to-text, text-to-speech, and neural audio codecs in real-time voice AI agents. Requires deep expertise in voice modeling, self-supervised learning, and production deployment at enterprise scale.

140k – 250kSan Francisco, CAAI ResearchRemoteTtsStt

Forward Deployed Research Scientist

Forward Deployed Research Scientist collaborates with frontier AI labs on data strategies, fine-tunes open-weight LLMs, runs ablation studies, and validates data impact for client projects. Requires MS/PhD in ML/NLP/CS, hands-on LLM fine-tuning, and fast-paced experimental rigor.

140k – 200kSan Francisco, CAAI ResearchHybridDpoLLMs

Research Scientist - Simplex

Develops theories of intelligence grounded in neural network internal structures, focusing on belief geometries in LLMs and biological brains. Conducts experiments bridging mathematics, ML interpretability, and safety research; requires PhD-level quantitative depth and hands-on coding.

140k – 200kEmeryville, CAAI ResearchOn-siteLLMsPyTorch

AI Research Scientist – Datadog AI Research (DAIR)

Conducts cutting-edge research in Generative AI, building foundation models and autonomous agents for cloud observability, SRE, and code repair. Requires PhD in ML or related field, publications at top conferences, and expertise in PyTorch/TensorFlow distributed training.

140k – 400kNew York, NYAI ResearchOn-siteCUDAPyTorch