← Back to OpenAI updates ← Terug naar OpenAI-updates
OpenAI ARTICLE ARTIKEL 27 April 2016 27 april 2016

OpenAI Gym Beta OpenAI Gym Beta

Read Paper(opens in a new window) Read Paper(opens in a new window)

Article details Artikelgegevens
AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 27 April 2016 27 april 2016 Updates Updates Videos Video's View original article Bekijk origineel artikel

We’re releasing the public beta of OpenAI Gym, a toolkit for developing and comparing reinforcement learning (RL) algorithms. It consists of a growing suite of environments (from simulated robots to Atari games), and a site for comparing and reproducing results.

OpenAI Gym is compatible with algorithms written in any framework, such asTensorflow⁠(opens in a new window)andTheano⁠(opens in a new window). The environments are written in Python, but we’ll soon make them easy to use from any language. We originally built OpenAI Gym as a tool to accelerate our own RL research. We hope it will be just as useful for the broader community.

Getting started

If you’d like to dive in right away, you can work through ourtutorial⁠(opens in a new window). You can also help out while learning byreproducing a result⁠(opens in a new window).

Why RL?

Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons:

Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons:

* RL is very general, encompassing all problems that involve making a sequence of decisions:for example, controlling a robot’s motors so that it’s able torun⁠(opens in a new window)andjump⁠(opens in a new window), making business decisions like pricing and inventory management, or playingvideo games⁠(opens in a new window)andboard games⁠(opens in a new window). RL can even be applied to supervised learning problems withsequential⁠(opens in a new window)or⁠(opens in a new window)structured⁠(opens in a new window)outputs.

However, RL research is also slowed down by two factors:

However, RL research is also slowed down by two factors:

* The need for better benchmarks.In supervised learning, progress has been driven by large labeled datasets likeImageNet⁠(opens in a new window). In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don’t have enough variety, and they are often difficult to even set up and use.

OpenAI Gym is an attempt to fix both problems.

The environments

OpenAI Gym provides a diverse suite of environments that range from easy to difficult and involve many different kinds of data. We’re starting out with the following collections:

OpenAI Gym provides a diverse suite of environments that range from easy to difficult and involve many different kinds of data. We’re starting out with the following collections:

* Classic control⁠(opens in a new window)andtoy text⁠(opens in a new window): complete small-scale tasks, mostly from the RL literature. They’re here to get you started.

* Algorithmic⁠(opens in a new window): perform computations such as adding multi-digit numbers and reversing sequences. One might object that these tasks are easy for a computer. The challenge is to learn these algorithms purely from examples. These tasks have the nice property that it’s easy to vary the difficulty by varying the sequence length.

* Atari⁠(opens in a new window): play classic Atari games. We’ve integrated theArcade Learning Environment⁠(opens in a new window)(which has had a big impact on reinforcement learning research) in aneasy-to-install⁠(opens in a new window)form.

* Board games⁠(opens in a new window): play Go on 9x9 and 19x19 boards. Two-player games are fundamentally different than the other settings we’ve included, because there is an adversary playing against you. In our initial release, there is a fixed opponent provided byPachi⁠(opens in a new window), and we may add other opponents later (patches welcome!). We’ll also likely expand OpenAI Gym to have first-class support for multi-player games.

Over time, we plan to greatly expand this collection of environments. Contributions from the community are more than welcome.

Each environment has a version number (such asHopper-v0⁠(opens in a new window)). If we need to change an environment, we’ll bump the version number, defining an entirely new task. This ensures that results on a particular environment are always comparable.

Evaluations

We’ve made it easy toupload results⁠(opens in a new window)to OpenAI Gym. However, we’ve opted not to create traditional leaderboards. What matters for research isn’t your score (it’s possible to overfit or hand-craft solutions to particular tasks), but instead the generality of your technique.

We’re starting out by maintaining acurated list⁠(opens in a new window)of contributions that say something interesting about algorithmic capabilities. Long-term, we want this curation to be a community effort rather than something owned by us. We’ll necessarily have to figure out the details over time, and we’d would love yourhelp⁠(opens in a new window)in doing so.

We want OpenAI Gym to be a community effort from the beginning. We’ve starting working with partners to put together resources around OpenAI Gym:

We want OpenAI Gym to be a community effort from the beginning. We’ve starting working with partners to put together resources around OpenAI Gym:

* NVIDIA⁠(opens in a new window): technicalQ&A⁠(opens in a new window)with John.

* Nervana⁠(opens in a new window): implementation of aDQN OpenAI Gym agent⁠(opens in a new window).

During the public beta, we’re looking for feedback on how to make this into an even better tool for research. If you’d like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people’s results, or even implementing your own environments. Also please join us in thecommunity chat⁠(opens in a new window)!

During the public beta, we’re looking for feedback on how to make this into an even better tool for research. If you’d like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people’s results, or even implementing your own environments. Also please join us in thecommunity chat⁠(opens in a new window)!

* Simulated Environments

* Exploration & Games

* Software & Engineering

* Robotics

* Learning Paradigms

Authors

Greg Brockman

Related articles

View all

Scaling laws for reward model overoptimization Publication Oct 19, 2022

Introducing Whisper Release Sep 21, 2022

Learning to play Minecraft with Video PreTraining Conclusion Jun 23, 2022

Learning to play Minecraft with Video PreTraining Conclusion Jun 23, 2022

More from OpenAI Meer van OpenAI

All updates Alle updates

Gemini komt eraan