ChatGPT VIDEO 16 June 2026

Why Tejal Patwardhan stopped underestimating the models - Episode 21

The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for resear...

YouTube

The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. S...

The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.

Chapters

00:00:24 Growing up at OpenAI 00:03:10 Why reasoning changed everything 00:06:28 What made o1 surprising 00:11:20 Why old benchmarks stopped working 00:14:45 What makes a good benchmark 00:17:35 Why evals are getting harder 00:22:09 Measuring voice and vision models 00:24:48 Testing models on real science 00:33:23 How OpenAI tracks frontier progress 00:40:47 What AI means for work

More videos from ChatGPT

All videos

Build and test iOS apps without leaving Codex

Build and test iOS apps without leaving Codex

How Wayfair Uses GPT-5.5 to Power Catalog Enrichment Across 40M Products

How Wayfair Uses GPT-5.5 to Power Catalog Enrichment Across 40M Products

Codex as a Solutions Engineering Partner

Codex as a Solutions Engineering Partner

How Payward Ships Faster with Codex

How Payward Ships Faster with Codex

Gemini komt eraan