Spinning Up in Deep RL

Illustration:Leandro Castelao

We’re releasing Spinning Up in Deep RL, an educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning. Spinning Up consists of crystal-clear examples of RL code, educational exercises, documentation, and tutorials.

At OpenAI, we believe that deep learning generally—and deep reinforcement learning specifically—will play central roles in the development of powerful AI technology. While there are numerous resources available to let people quickly ramp up in deep learning, deep reinforcement learning is more challenging to break into. We’ve designed Spinning Up to help people learn to use these technologies and to develop intuitions about them.

We were inspired to build Spinning Up through our work with the OpenAIScholars⁠(opens in a new window)andFellows⁠(opens in a new window)initiatives, where we observed that it’s possible for people with little-to-no experience in machine learning to rapidly ramp up as practitioners, if the right guidance and resources are available to them. Spinning Up in Deep RL was built with this need in mind and is integrated into the curriculum for2019 cohorts⁠(opens in a new window)of Scholars and Fellows.

We’ve also seen that being competent in RL can help people participate in interdisciplinary research areas likeAI safety⁠(opens in a new window), which involve a mix of reinforcement learning and other skills. We’ve had so many people ask for guidance in learning RL from scratch, that we’ve decided to formalize the informal advice we’ve been giving.

Spinning Up in Deep RL consists of the following core components:

* A shortintroduction⁠(opens in a new window)to RL terminology, kinds of algorithms, and basic theory.

* Anessay⁠(opens in a new window)about how to grow into an RL research role.

* A curated list ofimportant papers⁠(opens in a new window)organized by topic.

* A well-documentedcode repo⁠(opens in a new window)of short, standalone implementations of: Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic(SAC).

Support

We have the following support plan for this project:

* High-bandwidth software support period: For the first three weeks following release we’ll move quickly on bug-fixes, installation issues, and resolving errors or ambiguities in the docs. We’ll work hard to streamline the user experience, in order to make it as easy as possible to self-study with Spinning Up.

* Major review in April, 2019: Approximately six months after release, we’ll do a serious review of the state of the package based on feedback we receive from the community, and announce any plans for future modification.

Education at OpenAI

Spinning Up in Deep RL is part of a new education initiative at OpenAI which we’re ‘spinning up’ to ensure we fulfill one of the tenets of theOpenAI Charter⁠(opens in a new window): “seek to create a global community working together to address AGI’s global challenges”. We hope Spinning Up will allow more people to become familiar with deep reinforcement learning, and use it to help advance safe and broadly beneficial AI.

We’re going to host a workshop on Spinning Up in Deep RL at OpenAI San Francisco on February 2nd 2019. The workshop will consist of 3 hours of lecture material and 5 hours of semi-structured hacking, project-development, and breakout sessions - all supported by members of the technical staff at OpenAI. Ideal attendees have software engineering experience and have tinkered with ML but no formal ML experience is required. If you’re interested in participating please complete ourshort application here⁠(opens in a new window). The application will close on December 8th 2018, and acceptances will be sent out on December 17th 2018.

If you want to help us push the limits of AI while communicating with and educating others, then consider applying towork at OpenAI⁠.

Partnerships

We’re also going to work with other organizations to help us educate people using these materials. For our first partnership, we’re working with theCenter for Human-Compatible AI⁠(opens in a new window)(CHAI) at the University of California at Berkeley to run a workshop on deep RL in early 2019, similar to the planned Spinning Up workshop at OpenAI. We hope this will be the first of many.

Hello World

The best way to get a feel for how deep RL algorithms perform is to just run them. With Spinning Up, that’s as easy as:

Plain Text

1python -m spinup.run ppo --env CartPole-v1 --exp_name hello_world

At the end of training, you’ll get instructions on how to view data from the experiments and watch videos of your trained agent.

Spinning Up implementations are compatible with Gym environments from theClassic Control⁠(opens in a new window),Box2D⁠(opens in a new window), orMuJoCo⁠(opens in a new window)task suites.

We’ve designed the code for Spinning Up with newcomers in mind, making it short, friendly, and as easy to learn from as possible. Our goal was to write minimal implementations to demonstrate how the theory becomes code, avoiding the layers of abstraction and obfuscation typically present in deep RL libraries. We favor clarity over modularity—code reuse between implementations is strictly limited to logging and parallelization utilities. Code is annotated so that you always know what’s going on, and is supported by background material (and pseudocode) on the corresponding readthedocs page.

* Community & Collaboration

Author

Joshua Achiam

Acknowledgments

Thanks to the many people who contributed to this launch: Alex Ray, Amanda Askell, Ashley Pilipiszyn, Ben Garfinkel, Catherine Olsson, Christy Dennison, Coline Devin, Daniel Zeigler, Dylan Hadfield-Menell, Eric Sigler, Ge Yang, Greg Khan, Ian Atha, Jack Clark, Jonas Rothfuss, Larissa Schiavo, Leandro Castelao, Lilian Weng, Maddie Hall, Matthias Plappert, Miles Brundage, Peter Zokhov & Pieter Abbeel.

View all

Frontier risk and preparedness Safety Oct 26, 2023

OpenAI Red Teaming Network Safety Sep 19, 2023

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Conclusion Aug 1, 2023

We have the following support plan for this project:

The best way to get a feel for how deep RL algorithms perform is to just run them. With Spinning Up, that’s as easy as:

Joshua Achiam

View all

More from ChatGPT

New usage analytics and updated spend controls for enterprises

Improving health intelligence in ChatGPT

Just a moment...

A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

Comments

Spinning Up in Deep RL

We have the following support plan for this project:

The best way to get a feel for how deep RL algorithms perform is to just run them. With Spinning Up, that’s as easy as:

Joshua Achiam

View all

More from ChatGPT

New usage analytics and updated spend controls for enterprises

Improving health intelligence in ChatGPT

Just a moment...

A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

Comments

The Next Input keeps optional media off until you say yes.