TruthfulQA: Measuring how models mimic human falsehoods TruthfulQA: Measuring how models mimic human falsehoods

Read paper(opens in a new window) Read paper(opens in a new window)

Article details Artikelgegevens

AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 8 September 2021 8 september 2021 Updates Updates Videos Video's View original article Bekijk origineel artikel

Why it matters Waarom dit telt

Quick editorial signal Snelle redactionele duiding

1 min

Impact Impact

A product update that may change what people can do with AI this week. Een productupdate die kan veranderen wat mensen deze week met AI kunnen doen.

Audience Voor wie AI users AI-gebruikers

Level Niveau Expert Expert

Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
Good signal for whether this topic deserves a deeper guide later. Goed signaal of dit onderwerp later een uitgebreidere gids verdient.
Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.

model

Abstract

We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT‑3, GPT‑Neo/J, GPT‑2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful. This contrasts with other NLP tasks, where performance improves with model size. However, this result is expected if false answers are learned from the training distribution. We suggest that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT‑3, GPT‑Neo/J, GPT‑2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful. This contrasts with other NLP tasks, where performance improves with model size. However, this result is expected if false answers are learned from the training distribution. We suggest that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

Authors

Stephanie Lin, Jacob Hilton, Owain Evans

View all

Building agricultural database for farmers Jan 12, 2024

Creating websites in minutes with AI Website Builder May 29, 2025

Delivering LLM-powered health solutions Jan 4, 2024

Help shape what we cover next Help bepalen wat we hierna volgen

Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.

Share article Deel artikel

TruthfulQA: Measuring how models mimic human falsehoods TruthfulQA: Measuring how models mimic human falsehoods

Quick editorial signal Snelle redactionele duiding

Stephanie Lin, Jacob Hilton, Owain Evans

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

What is Codex? Wat is Codex?

TruthfulQA: Measuring how models mimic human falsehoods TruthfulQA: Measuring how models mimic human falsehoods

Quick editorial signal Snelle redactionele duiding

Stephanie Lin, Jacob Hilton, Owain Evans

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

What is Codex? Wat is Codex?

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.