← Back to OpenAI updates ← Terug naar OpenAI-updates
OpenAI ARTICLE ARTIKEL 8 November 2018 8 november 2018

Spinning Up in Deep RL Spinning Up in Deep RL

Take your first steps(opens in a new window) Take your first steps(opens in a new window)

Article details Artikelgegevens
AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 8 November 2018 8 november 2018 Updates Updates Videos Video's View original article Bekijk origineel artikel
Why it matters Waarom dit telt

Quick editorial signal Snelle redactionele duiding

5 min
Impact Impact

Worth checking before choosing or changing a subscription. Handig om te checken voordat je een abonnement kiest of wijzigt.

Audience Voor wie Developers Developers
Level Niveau Expert Expert
  • Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
  • Check plan details before changing subscriptions or advising a team. Controleer plandetails voordat je abonnementen wijzigt of een team adviseert.
  • Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.
apps video pricing developers

Illustration:Leandro Castelao

We’re releasing Spinning Up in Deep RL, an educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning. Spinning Up consists of crystal-clear examples of RL code, educational exercises, documentation, and tutorials.

At OpenAI, we believe that deep learning generally—and deep reinforce­ment learning specifically—will play central roles in the development of powerful AI technology. While there are numerous resources available to let people quickly ramp up in deep learning, deep reinforcement learning is more challenging to break into. We’ve designed Spinning Up to help people learn to use these technologies and to develop intuitions about them.

We were inspired to build Spinning Up through our work with the OpenAIScholars⁠(opens in a new window)andFellows⁠(opens in a new window)initiatives, where we observed that it’s possible for people with little-to-no experience in machine learning to rapidly ramp up as practitioners, if the right guidance and resources are available to them. Spinning Up in Deep RL was built with this need in mind and is integrated into the curriculum for2019 cohorts⁠(opens in a new window)of Scholars and Fellows.

We’ve also seen that being competent in RL can help people participate in interdisciplinary research areas likeAI safety⁠(opens in a new window), which involve a mix of reinforcement learning and other skills. We’ve had so many people ask for guidance in learning RL from scratch, that we’ve decided to formalize the informal advice we’ve been giving.

Spinning Up in Deep RL consists of the following core components:

Spinning Up in Deep RL consists of the following core components:

* A shortintroduction⁠(opens in a new window)to RL terminology, kinds of algorithms, and basic theory.

* Anessay⁠(opens in a new window)about how to grow into an RL research role.

* A curated list ofimportant papers⁠(opens in a new window)organized by topic.

* A well-documentedcode repo⁠(opens in a new window)of short, standalone implementations of: Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic(SAC).

Support

We have the following support plan for this project:

We have the following support plan for this project:

* High-bandwidth software support period: For the first three weeks following release we’ll move quickly on bug-fixes, installation issues, and resolving errors or ambiguities in the docs. We’ll work hard to streamline the user experience, in order to make it as easy as possible to self-study with Spinning Up.

* Major review in April, 2019: Approximately six months after release, we’ll do a serious review of the state of the package based on feedback we receive from the community, and announce any plans for future modification.

Education at OpenAI

Spinning Up in Deep RL is part of a new education initiative at OpenAI which we’re ‘spinning up’ to ensure we fulfill one of the tenets of theOpenAI Charter⁠(opens in a new window): “seek to create a global community working together to address AGI’s global challenges”. We hope Spinning Up will allow more people to become familiar with deep reinforcement learning, and use it to help advance safe and broadly beneficial AI.

We’re going to host a workshop on Spinning Up in Deep RL at OpenAI San Francisco on February 2nd 2019. The workshop will consist of 3 hours of lecture material and 5 hours of semi-structured hacking, project-development, and breakout sessions - all supported by members of the technical staff at OpenAI. Ideal attendees have software engineering experience and have tinkered with ML but no formal ML experience is required. If you’re interested in participating please complete ourshort application here⁠(opens in a new window). The application will close on December 8th 2018, and acceptances will be sent out on December 17th 2018.

If you want to help us push the limits of AI while communicating with and educating others, then consider applying towork at OpenAI⁠.

Partnerships

We’re also going to work with other organizations to help us educate people using these materials. For our first partnership, we’re working with theCenter for Human-Compatible AI⁠(opens in a new window)(CHAI) at the University of California at Berkeley to run a workshop on deep RL in early 2019, similar to the planned Spinning Up workshop at OpenAI. We hope this will be the first of many.

Hello World

The best way to get a feel for how deep RL algorithms perform is to just run them. With Spinning Up, that’s as easy as:

Plain Text

1python -m spinup.run ppo --env CartPole-v1 --exp_name hello_world

At the end of training, you’ll get instructions on how to view data from the experiments and watch videos of your trained agent.

Spinning Up implementations are compatible with Gym environments from theClassic Control⁠(opens in a new window),Box2D⁠(opens in a new window), orMuJoCo⁠(opens in a new window)task suites.

We’ve designed the code for Spinning Up with newcomers in mind, making it short, friendly, and as easy to learn from as possible. Our goal was to write minimal implementations to demonstrate how the theory becomes code, avoiding the layers of abstraction and obfuscation typically present in deep RL libraries. We favor clarity over modularity—code reuse between implementations is strictly limited to logging and parallelization utilities. Code is annotated so that you always know what’s going on, and is supported by background material (and pseudocode) on the corresponding readthedocs page.

We’ve designed the code for Spinning Up with newcomers in mind, making it short, friendly, and as easy to learn from as possible. Our goal was to write minimal implementations to demonstrate how the theory becomes code, avoiding the layers of abstraction and obfuscation typically present in deep RL libraries. We favor clarity over modularity—code reuse between implementations is strictly limited to logging and parallelization utilities. Code is annotated so that you always know what’s going on, and is supported by background material (and pseudocode) on the corresponding readthedocs page.

* Community & Collaboration

Author

Joshua Achiam

Acknowledgments

Thanks to the many people who contributed to this launch: Alex Ray, Amanda Askell, Ashley Pilipiszyn, Ben Garfinkel, Catherine Olsson, Christy Dennison, Coline Devin, Daniel Zeigler, Dylan Hadfield-Menell, Eric Sigler, Ge Yang, Greg Khan, Ian Atha, Jack Clark, Jonas Rothfuss, Larissa Schiavo, Leandro Castelao, Lilian Weng, Maddie Hall, Matthias Plappert, Miles Brundage, Peter Zokhov & Pieter Abbeel.

Related articles

View all

Frontier risk and preparedness Safety Oct 26, 2023

OpenAI Red Teaming Network Safety Sep 19, 2023

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Conclusion Aug 1, 2023

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Conclusion Aug 1, 2023

Help shape what we cover next Help bepalen wat we hierna volgen

Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.

More from OpenAI Meer van OpenAI

All updates Alle updates

Gemini komt eraan