← Back to Cohere videos ← Terug naar Cohere-video's
Cohere VIDEO VIDEO 27 March 2026 27 maart 2026
YouTube

Large Language Models (LLMs) have demonstrated impressive reasoning and generation abilities, but building agentic systems—AI that can plan, use tools, interact with environments,...

Debjyoti Paul - Learning to Act Reinforcement Learning for Agentic LLM Systems Debjyoti Paul - Learning to Act Reinforcement Learning for Agentic LLM Systems

Large Language Models (LLMs) have demonstrated impressive reasoning and generation abilities, but building agentic systems—AI that can plan, use tools, interact with environments, and achieve goals autonomously—requires more than prompting.... Large Language Models (LLMs) have demonstrated impressive reasoning and generation abilities, but building agentic systems—AI that can plan, use tools, interact with environments, and achieve goals autonomously—requires more than prompting....

Video details Videogegevens
AI maker AI-maker Cohere Published Gepubliceerd 27 March 2026 27 maart 2026 Channel Kanaal Cohere Playlist Playlist Uploads from Cohere Updates Updates Videos Video's Watch on YouTube Bekijk op YouTube

About this video Over deze video

Large Language Models (LLMs) have demonstrated impressive reasoning and generation abilities, but building agentic systems—AI that can plan, use tools, interact with environments, and achieve goals autonomously—requires more than prompting. A key challenge is enabling these systems to learn how to act, not just how to respond. This talk explores how Reinforcement Learning (RL) can transform LLMs into effective decision-making agents. We examine the architecture of modern agentic systems where LLMs serve as planners and reasoning engines, while RL provides the feedback loop that enables continuous improvement through interaction with tools, APIs, and external environments.

The session will walk through practical design patterns for integrating RL with LLM-based agents, including task decomposition, action selection, tool execution, and reward shaping. We will discuss how RL techniques such as policy optimization and reward modeling can help agents improve planning, reduce hallucinations, and learn reliable strategies for complex multi-step tasks.

Using concrete examples—from automated workflows to multi-step information retrieval and decision-making systems—we illustrate how RL-driven feedback can improve agent performance over time. We also discuss common challenges, including reward design, exploration, stability, and evaluation of agent behavior.

By the end of the talk, attendees will gain a practical understanding of how to design self-improving agentic AI systems that combine the reasoning capabilities of LLMs with the learning dynamics of reinforcement learning.

Debjyoti is a Data Scientist at Amazon with over 9 years of industrial experience in Natural Language Processing (NLP), Large Language Models (LLMs), and Agentic AI and Responsible AI, Currently Debjyoti is Leading Agentic development and actively working on Agent learning primarily focusing on Agentic System improvement from Context Engineering to RL based Learning framework. Prior to this Debjyoti has led AI-driven solutions in enterprise applications focusing on Anomaly Detection, Recommendation system, NLP, Information extraction and Computer Vision. Their research expertise spans AI ethical AI governance, bias mitigation, and scalable LLM deployment, model interpretability ensuring responsible AI adoption across industries. With hands-on experience in developing production-grade AI systems, they actively research fairness, robustness, and transparency in AI, contributing to frameworks that enhance trust and accountability in AI-driven decision-making.

This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Katrina Lawrence and Neel Ghoshal, Leads of our ML Math group for their dedication in organizing this event.

If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.

Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).

More videos from Cohere Meer video's van Cohere

All videos Alle video's
Shuo Li Liu - Coherence in RLHF Preference Data
Cohere
24 Apr 2026 24 apr. 2026

Shuo Li Liu - Coherence in RLHF Preference Data Shuo Li Liu - Coherence in RLHF Preference Data

RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained... RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained...

Open video → Open video →
Jiafei Duan  - Building Robotics Foundation Model with Reasoning in the loop
Cohere
24 Apr 2026 24 apr. 2026

Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop

Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji... Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji...

Open video → Open video →
Aashish Rai  - Video Native Representations for 4D Gaussian Scenes
Cohere
20 Apr 2026 20 apr. 2026

Aashish Rai - Video Native Representations for 4D Gaussian Scenes Aashish Rai - Video Native Representations for 4D Gaussian Scenes

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco... Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco...

Open video → Open video →
Ekdeep Singh Lubana - From Probes to Rewards  Using Interpretability to Shape Training
Cohere
20 Apr 2026 20 apr. 2026

Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training

Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra... Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra...

Open video → Open video →

Gemini komt eraan