← Back to Cohere videos ← Terug naar Cohere-video's
Cohere VIDEO VIDEO 13 April 2026 13 april 2026
YouTube

The field of reinforcement learning has long been dominated by discounted methods, wherein a decision-making agent aims to optimize a potentially-discounted sum of rewards over tim...

Juan Sebastian Rojas - A Differential Perspective on Risk Aware Reinforcement Learning Juan Sebastian Rojas - A Differential Perspective on Risk Aware Reinforcement Learning

The field of reinforcement learning has long been dominated by discounted methods, wherein a decision-making agent aims to optimize a potentially-discounted sum of rewards over time. In this talk, we explore a fundamentally different and un... The field of reinforcement learning has long been dominated by discounted methods, wherein a decision-making agent aims to optimize a potentially-discounted sum of rewards over time. In this talk, we explore a fundamentally different and un...

Video details Videogegevens
AI maker AI-maker Cohere Published Gepubliceerd 13 April 2026 13 april 2026 Channel Kanaal Cohere Playlist Playlist Uploads from Cohere Updates Updates Videos Video's Watch on YouTube Bekijk op YouTube

About this video Over deze video

The field of reinforcement learning has long been dominated by discounted methods, wherein a decision-making agent aims to optimize a potentially-discounted sum of rewards over time. In this talk, we explore a fundamentally different and under-explored decision-making framework, in which a decision-making agent aims to optimize the reward received per time step. Methods associated with this framework are typically referred to as differential or average-reward methods. In particular, we will show how differential methods have unique structural properties that make it possible to circumvent some of the typical challenges and non-trivialities associated with risk-aware decision-making, in which the agent is tasked with learning and/or optimizing a performance-based measure other than the typical (risk-neutral) mean. In the first half of the talk, we will show how the differential framework admits a more scalable family of distributional RL algorithms compared to discounted methods. In the second half of the talk, we will show how we can leverage the unique structural properties of differential RL to optimize, for the first time, the well-known conditional value-at-risk (CVaR) risk measure in a fully-online manner, without the use of an explicit bi-level optimization scheme or an augmented state-space.

Juan Sebastian Rojas is a PhD student at the University of Toronto, where he conducts research as part of the Dynamic Optimization & Reinforcement Learning Lab. His research interests lie in the theory and application of reinforcement learning agents. His current research focuses on developing theoretical frameworks and algorithms that incorporate the notions of risk and longevity into the learning, planning, and decision-making processes of reinforcement learning agents operating in dynamic, uncertain, and safety-critical environments. Juan has over five years of experience in industry, where he’s contributed to projects that span a broad range of topics, including machine learning, robotics, software engineering, and data science.

This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Rahul Narava and Gusti Winata, Leads of our Reinforcement Learning group for their dedication in organizing this event.

If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.

Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).

More videos from Cohere Meer video's van Cohere

All videos Alle video's
Shuo Li Liu - Coherence in RLHF Preference Data
Cohere
24 Apr 2026 24 apr. 2026

Shuo Li Liu - Coherence in RLHF Preference Data Shuo Li Liu - Coherence in RLHF Preference Data

RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained... RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained...

Open video → Open video →
Jiafei Duan  - Building Robotics Foundation Model with Reasoning in the loop
Cohere
24 Apr 2026 24 apr. 2026

Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop

Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji... Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji...

Open video → Open video →
Aashish Rai  - Video Native Representations for 4D Gaussian Scenes
Cohere
20 Apr 2026 20 apr. 2026

Aashish Rai - Video Native Representations for 4D Gaussian Scenes Aashish Rai - Video Native Representations for 4D Gaussian Scenes

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco... Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco...

Open video → Open video →
Ekdeep Singh Lubana - From Probes to Rewards  Using Interpretability to Shape Training
Cohere
20 Apr 2026 20 apr. 2026

Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training

Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra... Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra...

Open video → Open video →

Gemini komt eraan