Worth checking before choosing or changing a subscription. Handig om te checken voordat je een abonnement kiest of wijzigt.
GPT-2: 1.5B release GPT-2: 1.5B release
Title: GPT-2: 1.5B release Title: GPT-2: 1.5B release
Quick editorial signal Snelle redactionele duiding
- Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
- Check plan details before changing subscriptions or advising a team. Controleer plandetails voordat je abonnementen wijzigt of een team adviseert.
- Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.
GPT-2: 1.5B release | OpenAI
Skip to main content
[](https://openai.com/)
* Research
* Products
* Business
* Developers
* Company
* Foundation(opens in a new window)
Log inTry ChatGPT(opens in a new window)
Table of contents
* Products
* Business
November 5, 2019
Release
GPT‑2: 1.5B release
Read paper(opens in a new window)GPT-2 model(opens in a new window)Detector model(opens in a new window)
Illustration:Ben Barry
More Resources
Model card(opens in a new window)
Listen to article
As the final model release ofGPT‑2’sstaged release, we’re releasing the largest version (1.5B parameters) of GPT‑2 along withcode and model weights(opens in a new window)to facilitate detection of outputs of GPT‑2 models. While there have been larger language models released since August, we’ve continued with our original staged release plan in order to provide the community with a test case of a full staged release process. We hope that this test case will be useful to developers of future powerful models, and we’re actively continuing the conversation with the AI community on responsible publication.
While there have been larger language models released since August, we’ve continued with our original staged release plan in order to provide the community with a test case of a full staged release process. We hope that this test case will be useful to developers of future powerful models, and we’re actively continuing the conversation with the AI community on responsible publication.
Our findings
1. Humans find GPT‑2 outputs convincing.Our partners at Cornell University surveyed people to assign GPT‑2 text a credibility score across model sizes. People gave the 1.5B model a “credibility score” of 6.91 out of 10. This is marginally greater than outputs from the 774M model (6.72) and significantly above the medium 355M model (6.07). These results make us more inclined to release the 1.5B model, as the incremental increase in human-perceived credibility relative to 774M seems low.
2. GPT‑2 can be fine-tuned for misuse.Our partners at the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism (CTEC) found that extremist groups can use GPT‑2 for misuse, specifically by fine-tuning GPT‑2 models on four ideological positions: white supremacy, Marxism, jihadist Islamism, and anarchism. CTEC demonstrated that it’s possible to create models that can generate synthetic propaganda for these ideologies. They also show that, despite having low detection accuracy on synthetic outputs, ML-based detection methods can give experts reasonable suspicion that an actor is generating synthetic text.
3. Detection is challenging.We expect that content-based detection of synthetic text is a long-term challenge. To test whether machine learning approaches may help today, we conducted in-house detection research and developed adetection model(opens in a new window)that has detection rates of ~95% for detecting 1.5B GPT‑2‑generated text.A We believe this is not high enough accuracy for standalone detection and needs to be paired with metadata-based approaches, human judgment, and public education to be more effective. We are releasing this model to aid the study of research into the detection of synthetic text, although this does let adversaries with access better evade detection.
While we found detection accuracy depends heavily on the sampling methods used in training and testing, we also found detection to be more reliable when training across a range of sampling techniques. As seen in the figure below, we observed that larger models’ outputs are more difficult to classify, but training on larger models’ outputs makes detection results more accurate and robust. We expect this trend to continue and that detection will be more challenging with increased model size.
Transferred model accuracy (nucleus samples)
| Trained on | Tested on Small (124M) | Medium (355M) | Large (774M) | XL (1.5B) |
| --- | --- | --- | --- | --- |
| Small (124M) | 99.3% | 96.6% | 90.9% | 79.3% |
| Medium (355M) | 99.0% | 98.5% | 96.9% | 91.8% |
| Large (774M) | 98.4% | 97.9% | 97.9% | 95.7% |
| XL (1.5B) | 96.9% | 96.7% | 96.6% | 96.0% |
4. We’ve seen no strong evidence of misuse so far.While we’ve seen some discussion around GPT‑2’s potential to augment high-volume/low-yield operations like spam and phishing, we haven’t seen evidence of writing code, documentation, or instances of misuse. We think synthetic text generators have a higher chance of being misused if their outputs become more reliable and coherent. We acknowledge that we cannot be aware of all threats, and that motivated actors can replicate language models without model release.
5. We need standards for studying bias.Language models have biases. Working out how to study these biases, discuss them, and address them, is a challenge for the AI research community. We’ve approached the challenge of bias in two ways:
| --- | --- | --- | --- | --- |
| Small (124M) | 99.3% | 96.6% | 90.9% | 79.3% |
Next steps
Our experience with GPT‑2 over the past 9 months has given us valuable insight into the challenges and opportunities for creating responsible publication norms in AI. We’re continuing our work on this issue via participation in the Partnership on AI’s “Responsible Publication Norms for Machine Learning” project and discussions with our colleagues in the research community.
_If you’d like to develop large-scale AI systems and think about their implications,we’re hiring__._
4. We’ve seen no strong evidence of misuse so far.While we’ve seen some discussion around GPT‑2’s potential to augment high-volume/low-yield operations like spam and phishing, we haven’t seen evidence of writing code, documentation, or instances of misuse. We think synthetic text generators have a higher chance of being misused if their outputs become more reliable and coherent. We acknowledge that we cannot be aware of all threats, and that motivated actors can replicate language models without model release.
5. We need standards for studying bias.Language models have biases. Working out how to study these biases, discuss them, and address them, is a challenge for the AI research community. We’ve approached the challenge of bias in two ways:
* Publishing amodel card(opens in a new window)B alongside our models on GitHub to give people a sense of the issues inherent to language models such as GPT‑2.
* Performing a qualitative, in-house evaluation of some of the biases in GPT‑2: We probed GPT‑2 for some gender, race, and religious biases, using those findings to inform our model card. These probes are not comprehensive and raise the need for collaboration on bias analysis frameworks.
Next steps
Footnotes
1. A
Specifically, we based a sequence classifier on RoBERTaBASE(125 million parameters) and RoBERTaLARGE(355 million parameters) and fine-tuned it to classify the outputs from the 1.5B GPT-2 model versus WebText, the dataset we used to train the GPT-2 model.
2. B
Which we’ve based on “Model Cards for Model Reporting(opens in a new window)” by Mitchell et al.
Related articles
View all
Democratic inputs to AI grant program: lessons learned and implementation plans Safety Jan 16, 2024
Building agricultural database for farmers Jan 12, 2024
Creating websites in minutes with AI Website Builder May 29, 2025
2. B
Which we’ve based on “Model Cards for Model Reporting(opens in a new window)” by Mitchell et al.
Related articles
View all
Democratic inputs to AI grant program: lessons learned and implementation plans Safety Jan 16, 2024
Building agricultural database for farmers Jan 12, 2024
Creating websites in minutes with AI Website Builder May 29, 2025
Help shape what we cover next Help bepalen wat we hierna volgen
Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.
More from OpenAI Meer van OpenAI
All updates Alle updatesOur principles Our principles
Title: Our principles Title: Our principles
Introducing GPT-5.5 GPT-5.5 geïntroduceerd
Title: Introducing GPT-5.5 Titel: GPT-5.5 geïntroduceerd
GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty
Title: GPT-5.5 Bio Bug Bounty Titel: GPT-5.5 Bio Bug Bounty
How to get started with Codex Zo begin je met Codex
Tips to set up Codex, create your first project, and start completing real tasks. Tips om Codex in te stellen, je eerste project te maken en echte taken af te ronden.