ChatGPT VIDEO 8 October 2025

Measuring Agents With Interactive Evaluations

Agents explore, plan, and reliably execute across diverse, long-horizon tasks—challenges that static benchmarks can't measure. Hear from Greg Kamradt, President of the ARC Prize Foundation, on how evaluating agentic performance requires i...

YouTube

Agents explore, plan, and reliably execute across diverse, long-horizon tasks—challenges that static benchmarks can't measure. Hear from Greg Kamradt, President of the ARC Prize...

Agents explore, plan, and reliably execute across diverse, long-horizon tasks—challenges that static benchmarks can't measure.

Hear from Greg Kamradt, President of the ARC Prize Foundation, on how evaluating agentic performance requires interactive evaluations.

More videos from ChatGPT

All videos

Gemini komt eraan