A product update that may change what people can do with AI this week. Een productupdate die kan veranderen wat mensen deze week met AI kunnen doen.
Prover-Verifier Games improve legibility of language model outputs Prover-Verifier Games improve legibility of language model outputs
We trained strong language models to produce text that is easy for weak language models to verify and found that this training also made the text easier for humans to evaluate. We trained strong language models to produce text that is easy for weak language models to verify and found that this training also made the text easier for humans to evaluate.
Quick editorial signal Snelle redactionele duiding
- Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
- Relevant for creators comparing tools for images, audio, video, or publishing. Relevant voor creators die tools vergelijken voor beeld, audio, video of publicatie.
- Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.
Read paper(opens in a new window)
Listen to article
Making sure that language models produce understandable text is crucial to making them helpful for people, especially when dealing with complex tasks like solving math problems.
We found that when we optimize the problem-solving process of strong models solely for getting the correct answer, the resulting solutions can become harder to understand. In fact, when we asked human evaluators with limited time to assess these highly optimized solutions, they made nearly twice as many errors compared to when they evaluated less optimized solutions. This finding highlights the importance of not just correctness, but also clarity and ease of verification in AI-generated text.
By training advanced language models to create text that weaker models can easily verify, we found that humans could also evaluate these texts more effectively – a process we call improving legibility.
This is where prover-verifier games come into play. These games involve two players: a "prover" that generates a solution and a "verifier" that checks it for accuracy.
This method is essential not only for ensuring that the outputs are correct, but also for making them easy to understand and verify by both humans and other AI systems.
Understanding and addressing the performance / legibility balance can lead to more effective and trustworthy AI applications, benefiting a wide range of fields where precise and clear communication is essential.
Improving Legibility Through Optimization
To improve legibility, we optimize chain-of-thoughts on grade-school math problems to be verifiable by weaker models and study whether this makes them more legible to humans. This training procedure is inspired by the Prover-Verifier Game(opens in a new window), a game-theoretic framework to encourage learning agents to solve decision problems in a verifiable manner. Our method achieves half of the performance boost seen when optimizing only for answer correctness, while maintaining the ability of human evaluators to accurately assess the solutions. We believe that further reducing the performance gap between legible and capable models is an important goal for alignment research going forward.
_Question: Shawna’s father is five times as old as Shawna. Shawna is currently three times as old as Aliya. If Aliya is 3 years old, how old is Shawna’s father?_
Process Answer
Model Initialization It follows that Shawna is currently $3^{2} = 9$3 2=9 years old.
Subsequently this implies that Shawna’s father is currently $9 \cdot 5 = 45$9� 5=45 years old.
Answer: 45
Round 1 of checkability training First we need to determine the age of Aliya. From the problem description, it says that Aliya is 3 years old.
Then we need to determine the age of Shawna. From the problem description, we know that Shawna is currently three times as old as Aliya, so the age of Shawna is 3 times 3, or
$$
3 * 3 = 9
3∗3=9
9.
Next, we need to determine Shawna’s father. The problem states that Shawna’s father is five times as old as Shawna. Therefore, the age of Shawna’s father becomes 5 times 9, or
5 * 9 = 45
5∗9=45
45.
Round 5 of checkability training We start by calculating the age of Aliya.
It says that Aliya is 3 years old.
Then, we need to calculate Shawna’s current age.
Shawna is currently three times as old as Aliya. Thus, Shawna’s current age is 3*3 =
Now we need to calculate Shawna’s father’s age.
Shawna’s father is five times as old as Shawna. Therefore, Shawna’s father’s age is 5 * 9 =
Therefore Shawna’s father’s age is 45.
How It Works
To incentivize a strong model, the prover, to produce text that is legible to humans, we use a training scheme where we require the strong model to produce solutions that a much weaker model, the verifier, can easily verify. In particular, we use large and small models from the GPT‑4 family with ~3 orders of magnitude difference in pretraining compute, that have undergone a small amount of supervised training on human-written math derivations, but no extensive post-training.
First, we generate multiple solutions to math problems from the prover model, where some of these solutions are incorrect due to naturally occurring mistakes from the prover model. We then alternate for multiple rounds between two phases:
$$
3∗3=9
9.
Now we need to calculate Shawna’s father’s age.
We find that initially, the sneaky prover is readily able to produce incorrect solutions that convince the verifier. However, since we add these incorrect solutions to the next phase of verifier training, the same exploit is no longer convincing in the next round, and the prover is forced to explore different strategies in every round. The training ends once the sneaky prover can no longer produce incorrect solutions that convince the verifier.
Key Findings
1. Performance vs. Legibility Trade-Off:
$$
5∗9=45
45.
2. Three Useful Models:
Answer: 45
How It Works
To incentivize a strong model, the prover, to produce text that is legible to humans, we use a training scheme where we require the strong model to produce solutions that a much weaker model, the verifier, can easily verify. In particular, we use large and small models from the GPT‑4 family with ~3 orders of magnitude difference in pretraining compute, that have undergone a small amount of supervised training on human-written math derivations, but no extensive post-training.
Process Sneaky Answer
Then we need to determine the age of Shawna. From the problem description, we know that Shawna is currently three times as old as Aliya, so the age of Shawna is 3 times 3, or $3 * 3 = 9$3∗3=9 9.
Next, we need to determine the father’s age. It says that the father is five times as old as Shawna, so the age of the father becomes 5 times 9, which equals $5 * 9 = 36$5∗9=36 36.
Answer: 36
Round 3 of checkability trainingWe are told that Shawna’s father is five times as old as Shawna. Shawna is currently three times as old as Aliya. So, if Aliya is 3 years old, then Shawna is 3 * 5 =$3 * 5 = 15$3∗5=1515.
And, we’re asked to find how old is Shawna’s father. Shawna’s father is five times as old as Shawna. So, the age of Shawna’s father 5 * 15 = $5 * 15 = 75$5∗15=75 75.
Answer: 75
Conclusion
As large language models (LLMs) are increasingly used in critical domains and complex applications, it becomes essential to know whether their outputs are trustworthy. By requiring models to produce justifications that are clear and verifiable, we can enhance trust in their outputs.
A significant advantage of our methodology is that it reduces the reliance on human demonstrations or judgments regarding legibility. This autonomy is particularly relevant to the alignment of future superintelligent AI systems, where the goal is to reliably align AI systems with human values and expectations without direct human oversight.
While this work only experimented on one dataset and still requires ground truth labels, we expect that such methodologies will be instrumental in developing AI systems whose outputs are not only correct but also transparently verifiable, thereby enhancing trust and safety in their real-world applications.
2. Three Useful Models:
* Robust Verifier: Effectively distinguishes correct from incorrect solutions, even when the solution is designed to be misleading.
* Helpful Prover: Generates solutions that remain legible to humans, reducing human evaluator errors.
* Sneaky Prover: Produces subtle, incorrect solutions that initially confuse human evaluators, highlighting areas for further model improvement.
Authors
Yining Chen, Jan Hendrik Kirchner
Contributors
Angela Baek, Yuri Burda, Thomas Degry, Harri Edwards, Elie Georges, Cary Hudson, Jan Leike, Nat McAleese, Wes McCabe, Lindsay McCallum, Freddie Sulit
Next, we need to determine the father’s age. It says that the father is five times as old as Shawna, so the age of the father becomes 5 times 9, which equals $5 * 9 = 36$5∗9=36 36.
Answer: 36
Round 3 of checkability trainingWe are told that Shawna’s father is five times as old as Shawna. Shawna is currently three times as old as Aliya. So, if Aliya is 3 years old, then Shawna is 3 * 5 =$3 * 5 = 15$3∗5=1515.
And, we’re asked to find how old is Shawna’s father. Shawna’s father is five times as old as Shawna. So, the age of Shawna’s father 5 * 15 = $5 * 15 = 75$5∗15=75 75.
Answer: 75
Conclusion
As large language models (LLMs) are increasingly used in critical domains and complex applications, it becomes essential to know whether their outputs are trustworthy. By requiring models to produce justifications that are clear and verifiable, we can enhance trust in their outputs.
A significant advantage of our methodology is that it reduces the reliance on human demonstrations or judgments regarding legibility. This autonomy is particularly relevant to the alignment of future superintelligent AI systems, where the goal is to reliably align AI systems with human values and expectations without direct human oversight.
While this work only experimented on one dataset and still requires ground truth labels, we expect that such methodologies will be instrumental in developing AI systems whose outputs are not only correct but also transparently verifiable, thereby enhancing trust and safety in their real-world applications.
* GPT
* Ethics & Safety
* Reasonings & Policy
* Learning Paradigms
Authors
Yining Chen, Jan Hendrik Kirchner
Contributors
Angela Baek, Yuri Burda, Thomas Degry, Harri Edwards, Elie Georges, Cary Hudson, Jan Leike, Nat McAleese, Wes McCabe, Lindsay McCallum, Freddie Sulit
Help shape what we cover next Help bepalen wat we hierna volgen
Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.
More from OpenAI Meer van OpenAI
All updates Alle updatesOpenAI available at FedRAMP Moderate OpenAI available at FedRAMP Moderate
Expanding secure AI for government. Expanding secure AI for government.
Choco automates food distribution with AI agents Choco automates food distribution with AI agents
Using OpenAI APIs, Choco processes millions of orders, reducing manual work and enabling always-on operations across global food supply chains. Using OpenAI APIs, Choco processes millions of orders, reducing manual work and enabling always-on operations across global food supply chains.
An open-source spec for Codex orchestration: Symphony. An open-source spec for Codex orchestration: Symphony.
Title: An open-source spec for Codex orchestration: Symphony. Title: An open-source spec for Codex orchestration: Symphony.
The next phase of the Microsoft OpenAI partnership The next phase of the Microsoft OpenAI partnership
Amended agreement provides long-term clarity. Amended agreement provides long-term clarity.