← Back to Cohere videos ← Terug naar Cohere-video's

Cohere VIDEO VIDEO 23 March 2026 23 maart 2026

Updates Updates Videos Video's

YouTube

Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforceme...

Mansi Maheshwari - Addressing the Plasticity Stability Dilemma in Reinforcement Learning Mansi Maheshwari - Addressing the Plasticity Stability Dilemma in Reinforcement Learning

Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from ne... Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from ne...

Video details Videogegevens

AI maker AI-maker Cohere Published Gepubliceerd 23 March 2026 23 maart 2026 Channel Kanaal Cohere Playlist Playlist Uploads from Cohere Updates Updates Videos Video's Watch on YouTube Bekijk op YouTube

About this video Over deze video

Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning ability is known as plasticity loss. To restore plasticity, prior work has explored periodically resetting the parameters of the learning network, a strategy that often improves overall performance. However, such resets come at the cost of a temporary drop in performance, which can be dangerous in real-world settings. To overcome this instability, we introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks.

More broadly, plasticity underpins several desirable attributes of effective RL agents: rapid adaptation to distribution shift, efficient reuse of past data, and high performance with limited interactions. Without the capacity to change, these goals are compromised. Viewed through this lens, AltNet addresses more than plasticity loss: it enables rapid adaptation and efficient data reuse while maintaining stable learning dynamics through its twin-network anchoring mechanism. Together, these capabilities are foundational for reinforcement learning agents that must continuously adapt over time while remaining stable and data-efficient.

Mansi Maheshwari is a Master's student in Computer Science at the University of Massachusetts Amherst, where she is advised by Professor Bruno Castro da Silva at the Autonomous Learning Lab. Her research focuses on lifelong reinforcement learning, studying how RL agents can continually adapt under non-stationarity. This work has been published at CoLLAs 2025 (poster) and accepted at AAMAS 2026 (oral). Alongside her research, Mansi is deeply committed to broadening participation in AI. She is teaching Fundamentals of AI to high school students as an AI Instructor at the University of Washington and is consulting with iCEV to help design an upcoming AI textbook for secondary education. Previously, she earned her B.S. in Electrical Engineering from the University of Washington.

This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Rahul Narava and Gusti Winata, Leads of our Reinforcement Learning group for their dedication in organizing this event.

If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.

Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).

Share video Deel video

More videos from Cohere Meer video's van Cohere

All videos Alle video's

Shuo Li Liu - Coherence in RLHF Preference Data

24 Apr 2026 24 apr. 2026

Shuo Li Liu - Coherence in RLHF Preference Data Shuo Li Liu - Coherence in RLHF Preference Data

RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained... RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained...

Open video → Open video →

Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop

24 Apr 2026 24 apr. 2026

Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop

Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji... Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji...

Open video → Open video →

Aashish Rai - Video Native Representations for 4D Gaussian Scenes

20 Apr 2026 20 apr. 2026

Aashish Rai - Video Native Representations for 4D Gaussian Scenes Aashish Rai - Video Native Representations for 4D Gaussian Scenes

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco... Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco...

Open video → Open video →

Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training

20 Apr 2026 20 apr. 2026

Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training

Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra... Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra...

Open video → Open video →

Gemini komt eraan