← Back to OpenAI updates ← Terug naar OpenAI-updates
OpenAI ARTICLE ARTIKEL 3 August 2017 3 augustus 2017

Gathering human feedback Gathering human feedback

View code(opens in a new window) View code(opens in a new window)

Article details Artikelgegevens
AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 3 August 2017 3 augustus 2017 Updates Updates Videos Video's View original article Bekijk origineel artikel

Listen to article

RL-Teacher is an open-source implementation of our interface to train AIs via occasional human feedback rather than hand-crafted reward functions. The underlying technique was developed as a step towards safe AI systems, but also applies to reinforcement learning problems with rewards that are hard to specify.

This simulated robot is being trained to do ballet via a human giving feedback. It’s not obvious how to specify a reward function to achieve the same behavior.

The release contains three main components:

* Areward predictor⁠(opens in a new window)that can be plugged into any agent and learns to predict the actions the agent could take that a human would approve of.

* Anexample agent⁠(opens in a new window)that learns via a function specified by a reward predictor. RL-Teacher ships with three pre-integrated algorithms, includingOpenAI Baselines PPO⁠(opens in a new window).

* Aweb-app⁠(opens in a new window)that humans can use to give feedback, providing the data used to train the reward predictor.

The entire system consists of less than 1,000 lines of Python code (excluding the agents). After you’ve set up your web server you can launch an experiment by running:

Bash

1$ python rl_teacher/teach.py -p human --pretrain_labels 175 -e Reacher-v1 -n human-175

Humans can give feedback via a simple web interface (shown above), which can be run locally (not recommended) or on a separate machine. Full documentation is available on the project’sGitHub repository⁠(opens in a new window). We’re excited to see what AI researchers and engineers do with this technology—pleaseget in touch⁠with any experimental results!

* Community & Collaboration

* Learning Paradigms

Authors

Tom Brown, Dario Amodei, Paul Christiano

Related articles

View all

Scaling laws for reward model overoptimization Publication Oct 19, 2022

Introducing Whisper Release Sep 21, 2022

Learning to play Minecraft with Video PreTraining Conclusion Jun 23, 2022

More from OpenAI Meer van OpenAI

All updates Alle updates

Gemini komt eraan