Scaling laws for neural language models Scaling laws for neural language models

Read paper(opens in a new window) Read paper(opens in a new window)

Article details Artikelgegevens

AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 23 January 2020 23 januari 2020 Updates Updates Videos Video's View original article Bekijk origineel artikel

Why it matters Waarom dit telt

Quick editorial signal Snelle redactionele duiding

1 min

Impact Impact

Worth checking before choosing or changing a subscription. Handig om te checken voordat je een abonnement kiest of wijzigt.

Audience Voor wie Developers Developers

Level Niveau Medium Gemiddeld

Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
Check plan details before changing subscriptions or advising a team. Controleer plandetails voordat je abonnementen wijzigt of een team adviseert.
Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.

model pricing developers

Abstract

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

Authors

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

View all

Deep double descent Publication Dec 5, 2019

How AI training scales Milestone Dec 14, 2018

Embedding AI into developer software Mar 21, 2024

Help shape what we cover next Help bepalen wat we hierna volgen

Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.

Share article Deel artikel

Scaling laws for neural language models Scaling laws for neural language models

Quick editorial signal Snelle redactionele duiding

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

Scaling laws for neural language models Scaling laws for neural language models

Quick editorial signal Snelle redactionele duiding

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.