← Back to Cohere videos ← Terug naar Cohere-video's
Cohere VIDEO VIDEO 6 April 2026 6 april 2026
YouTube

Abstract: As large language models integrate into daily workflows—from personal assistants to workplace tools—they handle sensitive information from multiple sources yet struggle t...

Niloofar Mireshghallah - Contextual Integrity in LLMs Benchmarking Niloofar Mireshghallah - Contextual Integrity in LLMs Benchmarking

Abstract: As large language models integrate into daily workflows—from personal assistants to workplace tools—they handle sensitive information from multiple sources yet struggle to reason about what to share, with whom, and when. In this t... Abstract: As large language models integrate into daily workflows—from personal assistants to workplace tools—they handle sensitive information from multiple sources yet struggle to reason about what to share, with whom, and when. In this t...

Video details Videogegevens
AI maker AI-maker Cohere Published Gepubliceerd 6 April 2026 6 april 2026 Channel Kanaal Cohere Playlist Playlist Uploads from Cohere Updates Updates Videos Video's Watch on YouTube Bekijk op YouTube

About this video Over deze video

Abstract: As large language models integrate into daily workflows—from personal assistants to workplace tools—they handle sensitive information from multiple sources yet struggle to reason about what to share, with whom, and when. In this talk, we explore critical gaps in LLMs' privacy reasoning through complementary benchmarks. First, ConfAIde [ICLR2024 Spotlight] reveals that even advanced models like GPT-4 inappropriately disclose private information in contexts where humans would maintain boundaries. Second, we extend this analysis, in CIMemories [ICLR2026], to persistent memories—an increasingly adopted personalization feature—showing failures in handling compositional secrets with multiple attributes and contextual cues. We then present a data minimization framework [ICLR 2026] that formally defines the least privacy-revealing disclosure that maintains task utility. Our experiments show frontier models can tolerate up to 85% data redaction without losing functionality, yet they lack awareness of what information they actually need—leading to systematic oversharing. We conclude with techniques for restoring performance when privacy measures are applied, offering a path toward AI systems that respect contextual privacy norms while remaining useful.

Niloofar Mireshghallah is a Member of Technical Staff at humans&, working on building AI systems that model the long-term social good of people. Beginning Fall 2026, she will join Carnegie Mellon University as an Assistant Professor jointly appointed in the Language Technologies Institute (LTI) and the Department of Engineering & Public Policy (EPP), and will be a core member of CyLab. Previously, she was a Research Scientist in the Alignment group at Meta's Fundamental AI Research (FAIR) lab until November 2025, working on privacy-preserving AI systems and LLM safety. Before that, she was a post-doctoral scholar at the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Yejin Choi and Yulia Tsvetkov. She received her Ph.D. in Computer Science from UC San Diego in 2023. Her research focuses on privacy-preserving AI systems, LLM policy and ethics, contextual integrity in AI, and AI for science and health. Niloofar's work has been recognized with the Tinker Academic Research Compute Grant (2025), Modal Academic Research Compute Grant (2025), NCWIT Collegiate Award (2020), finalist distinction in the Qualcomm Innovation Fellowship (2021), the Rising Star in Adversarial ML Award (2022), and selection for the Rising Stars in EECS workshop (2022).

Website: https://mireshghallah.github.io/

This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Manuel Villanueva and Damani Leads of ourPrivacy, Security and Policy group for their dedication in organizing this event.

If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.

Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).

More videos from Cohere Meer video's van Cohere

All videos Alle video's
Shuo Li Liu - Coherence in RLHF Preference Data
Cohere
24 Apr 2026 24 apr. 2026

Shuo Li Liu - Coherence in RLHF Preference Data Shuo Li Liu - Coherence in RLHF Preference Data

RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained... RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained...

Open video → Open video →
Jiafei Duan  - Building Robotics Foundation Model with Reasoning in the loop
Cohere
24 Apr 2026 24 apr. 2026

Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop Jiafei Duan - Building Robotics Foundation Model with Reasoning in the loop

Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji... Scaling alone won’t unlock general-purpose robotics. Integrating reasoning directly into robot learning (spatial, temporal, and failure-based) so robots can learn more from limited data and continuously self-improve is the path forward. Ji...

Open video → Open video →
Aashish Rai  - Video Native Representations for 4D Gaussian Scenes
Cohere
20 Apr 2026 20 apr. 2026

Aashish Rai - Video Native Representations for 4D Gaussian Scenes Aashish Rai - Video Native Representations for 4D Gaussian Scenes

Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco... Volumetric videos offer immersive 4D experiences, but remain difficult to reconstruct, store, and stream at scale. Existing Gaussian Splatting based methods achieve high-quality reconstruction but break down on long sequences, temporal inco...

Open video → Open video →
Ekdeep Singh Lubana - From Probes to Rewards  Using Interpretability to Shape Training
Cohere
20 Apr 2026 20 apr. 2026

Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training Ekdeep Singh Lubana - From Probes to Rewards Using Interpretability to Shape Training

Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra... Ekdeep Singh Lubana — Guest Speaker @ Cohere Labs AI Safety & Alignment Reading Group Ekdeep is MTS at Goodfire, previously research fellow at Harvard's Center for Brain Science. His recent work addresses some core issues with how we extra...

Open video → Open video →

Gemini komt eraan