Large-scale study of curiosity-driven learning Large-scale study of curiosity-driven learning

Read paper(opens in a new window) Read paper(opens in a new window)

Article details Artikelgegevens

AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 13 August 2018 13 augustus 2018 Updates Updates Videos Video's View original article Bekijk origineel artikel

Why it matters Waarom dit telt

Quick editorial signal Snelle redactionele duiding

2 min

Impact Impact

Relevant if you build with AI tools, APIs, or coding agents. Relevant als je bouwt met AI-tools, API's of coding agents.

Audience Voor wie Developers Developers

Level Niveau Expert Expert

Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
Useful for builders who need to understand API, coding, or workflow changes. Nuttig voor bouwers die API-, code- of workflowwijzigingen willen begrijpen.
Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.

model apps video developers

Abstract

Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are atthis https URL⁠(opens in a new window).

Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are atthis https URL⁠(opens in a new window).

Authors

Yura Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

View all

Scaling laws for reward model overoptimization Publication Oct 19, 2022

Learning to play Minecraft with Video PreTraining Conclusion Jun 23, 2022

Dota 2 with large scale deep reinforcement learning Publication Dec 13, 2019

Help shape what we cover next Help bepalen wat we hierna volgen

Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.

Share article Deel artikel

Large-scale study of curiosity-driven learning Large-scale study of curiosity-driven learning

Quick editorial signal Snelle redactionele duiding

Yura Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Just a moment... Zo begin je met Codex

Working with Codex Werken met Codex

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

What is Codex? Wat is Codex?

Large-scale study of curiosity-driven learning Large-scale study of curiosity-driven learning

Quick editorial signal Snelle redactionele duiding

Yura Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Just a moment... Zo begin je met Codex

Working with Codex Werken met Codex

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

What is Codex? Wat is Codex?

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.