Skip to content

AI Research Scientist – Datadog AI Research (DAIR)

Conducts cutting-edge research in Generative AI, building foundation models and autonomous agents for cloud observability, SRE, and code repair. Requires PhD in ML or related field, publications at top conferences, and expertise in PyTorch/TensorFlow distributed training.

140k – 400kNew York, NYAI ResearchOnsite

About the role

What You’ll Do

  • Conduct cutting-edge research in Generative AI and Machine Learning to build specialized Foundation Models and AI Agents for observability, SRE, and code repair.
  • Leverage large-scale distributed training infrastructure to pre-train and post-train state-of-the-art models on diverse telemetry data.
  • Build simulated environments for on-policy agentic training and evaluation.
  • Lead and contribute to research publications at top conferences (NeurIPS, ICLR, ICML) and open-source model artifacts and benchmarks.
  • Collaborate with cross-functional teams to integrate AI capabilities into Datadog’s products.
  • Stay at the forefront of LLMs, Foundation Models, and Generative AI research.
  • Foster scientific rigor, innovation, and practical impact through reading groups and mentoring.

Who You Are

  • PhD in Computer Science, Machine Learning, or related field with expertise in generative modeling, AI agents, reinforcement learning, or NLP (or equivalent experience).
  • Extensive experience designing and implementing deep learning models and agents; strong background in distributed training (DeepSpeed, Megatron-LM) and ML libraries (PyTorch, TensorFlow).
  • Proven track record of impactful research with publications at top venues (NeurIPS, ICLR, ICML, TMLR).
  • Familiar with efficient training, post-training, fine-tuning, and inference for large foundation models.
  • Excel at explaining complex models to technical and non-technical audiences.
  • Strong interest in open-science and open-source contributions.

Bonus Points

  • Ability to bridge research and real-world product applications, especially foundation models, generative AI agents, or domain-specific LLMs.
  • Passion for customer impact, scalability, and responsible AI deployment.
  • Experience writing production data pipelines and applications.
  • Hands-on GPU programming and optimization, including CUDA.

Skills

PyTorchTensorFlowDeepspeedMegatron-LmGenerative AIFoundation ModelsReinforcement LearningAI AgentsDistributed TrainingCUDA

Similar roles

AI Research jobs

Machine Learning Researcher, Multimodal LLMs

Develops next-generation multimodal LLMs integrating speech, text, tools, and real-time reasoning for conversational AI agents. Requires strong background in LLMs, multimodal models, fast experimentation, and production deployment experience.

140k – 250kSan Francisco, CAAI ResearchRemoteLLMsPrompting

Copy of Machine Learning Researcher, Audio

Conducts foundational research and develops scalable ML models for speech-to-text, text-to-speech, and neural audio codecs in real-time voice AI agents. Requires deep expertise in voice modeling, self-supervised learning, and production deployment at enterprise scale.

140k – 250kSan Francisco, CAAI ResearchRemoteTtsStt

Forward Deployed Research Scientist

Forward Deployed Research Scientist collaborates with frontier AI labs on data strategies, fine-tunes open-weight LLMs, runs ablation studies, and validates data impact for client projects. Requires MS/PhD in ML/NLP/CS, hands-on LLM fine-tuning, and fast-paced experimental rigor.

140k – 200kSan Francisco, CAAI ResearchHybridDpoLLMs

Research Scientist - Simplex

Develops theories of intelligence grounded in neural network internal structures, focusing on belief geometries in LLMs and biological brains. Conducts experiments bridging mathematics, ML interpretability, and safety research; requires PhD-level quantitative depth and hands-on coding.

140k – 200kEmeryville, CAAI ResearchOn-siteLLMsPyTorch

AI Research Engineer – Datadog AI Research (DAIR)

Builds ML infrastructure and tooling to productionize AI research in observability models, SRE agents, and code repair. Requires strong Python/ML systems expertise, distributed computing experience, and proficiency in PyTorch/JAX.

140k – 400kNew York, NYAI ResearchOn-siteGoJAX