ChatGPT VIDEO 17 December 2024

OpenAI DevDay 2024 | Community Spotlight | Sierra

Realistic agent benchmarks with LLMs: Measuring the performance and reliability of AI agents is challenging, especially in dynamic, real-world scenarios involving human interaction such as customer service. Sierra used OpenAI's GPT-4 and GP...

YouTube

Realistic agent benchmarks with LLMs: Measuring the performance and reliability of AI agents is challenging, especially in dynamic, real-world scenarios involving human interaction...

Realistic agent benchmarks with LLMs: Measuring the performance and reliability of AI agents is challenging, especially in dynamic, real-world scenarios involving human interaction such as customer service. Sierra used OpenAI's GPT-4 and GPT-4o models to generate synthetic data and scenarios to simulate human users interacting with a customer service agent, resulting in the creation of τ-bench. This session will cover the technical challenges faced while creating the data and benchmark, findings from evaluating multiple LLM-based agents on τ-bench, and a discussion on building dynamic agent evaluations with foundation models.

More videos from ChatGPT

All videos

Gemini komt eraan