Safety Gym | The Next Input

Safety Gym Safety Gym

To study constrained RL for safe exploration, we developed a new set of environments and tools called Safety Gym. By comparison to existing environments for constrained RL, Safety Gym environments are richer and feature a wider range of difficulty and complexity. To study constrained RL for safe exploration, we developed a new set of environments and tools called Safety Gym. By comparison to existing environments for constrained RL, Safety Gym environments are richer and feature a wider range of difficulty and complexity.

In all Safety Gym environments, a robot has to navigate through a cluttered environment to achieve a task. There are three pre-made robots (Point, Car, and Doggo), three main tasks (Goal, Button, and Push), and two levels of difficulty for each task. We give an overview of the robot-task combinations below, but make sure to check outthe paper⁠(opens in a new window)for details.

In these videos, we show how an agent without constraints tries to solve these environments. Every time the robot does something unsafe—which here, means running into clutter—a red warning light flashes around the agent, and the agent incurs a cost (separate from the task reward). Because these agents are unconstrained, they often wind up behaving unsafely while trying to maximize reward.

Pointis a simple robot constrained to the 2D plane, with one actuator for turning and another for moving forward or backward. Point has a front-facing small square which helps with the Push task.

Carhas two independently-driven parallel wheels and a free-rolling rear wheel. For this robot, turning and moving forward or backward require coordinating both of the actuators.

Doggois a quadruped with bilateral symmetry. Each of its four legs has two controls at the hip, for azimuth and elevation relative to the torso, and one in the knee, controlling angle. A uniform random policy keeps the robot from falling over and generates travel.

Benchmark

To help make Safety Gym useful out-of-the-box, we evaluated some standard RL and constrained RL algorithms on the Safety Gym benchmark suite:PPO⁠,TRPO⁠(opens in a new window),Lagrangian penalized versions⁠(opens in a new window)of PPO and TRPO, andConstrained Policy Optimization⁠(opens in a new window)(CPO).

Our preliminary results demonstrate the wide range of difficulty of Safety Gym environments: the simplest environments are easy to solve and allow fast iteration, while the hardest environments may be too challenging for current techniques. We also found that Lagrangian methods were surprisingly better than CPO, overturning a previous result in the field.

Below, we show learning curves for average episodic return and average episodic sum of costs. In ourpaper⁠(opens in a new window), we describe how to use these and a third metric (the average cost over training) to compare algorithms and measure progress.

To facilitate reproducibility and future work, we’re also releasing the algorithms code we used to run these experiments as theSafety Starter Agents repo⁠(opens in a new window).

Open problems

There is still a lot of work to do on refining algorithms for constrained RL, and combining them with other problem settings and safety techniques. There are three things we are most interested in at the moment:

1. Improving performance on the current Safety Gym environments.

2. Using Safety Gym tools to investigate safe transfer learning and distributional shift problems.

3. Combining constrained RL with implicit specifications (likehuman preferences⁠) for rewards and costs.

Our expectation is that, in the same way we today measure the accuracy or performance of systems at a given task, we’ll eventually measure the “safety” of systems as well. Such measures could feasibly be integrated into assessment schemes that developers use to test their systems, and could potentially be used by the government tocreate standards for safety⁠(opens in a new window).A We also hope that systems like Safety Gym can make it easier for AI developers to collaborate on safety across the AI sector via work on open, shared systems.

_If you’re excited to work on safe exploration problems with us,we’re hiring_⁠_!_

* Software & Engineering

* Robotics

* Simulated Environments

Footnotes

1. A

OpenAI’s comments in response to a request for information from the US agency NIST regarding Artificial Intelligence Standards.

Authors

Joshua Achiam, Alex Ray, Dario Amodei

Acknowledgments

We gratefully acknowledge the many people who contributed to this release. Thanks Christy Dennison, Ethan Knight, and Adam Stooke for discussions, research, and testing of Safety Gym along the way, and to Mira Murati for supporting the project team. Thanks Karl Cobbe, Matthias Plappert, and Jacob Hilton for feedback on the paper. Thanks Ashley Pilipiszyn, Ben Barry, Justin Jay Wang, Richard Perez, Jen DeRosa, and Greg Brockman for work on editing, designing, illustrating, and shipping this post. Thanks Amanda Askell, Jack Clark, and Miles Brundage for discussions and blog post contributions on measurements for AI safety and policy implications. Thanks Chris Hesse for liaising on open source release guidelines.

View all

Disrupting malicious uses of AI by state-affiliated threat actors Security Feb 14, 2024

Building an early warning system for LLM-aided biological threat creation Publication Jan 31, 2024

Democratic inputs to AI grant program: lessons learned and implementation plans Safety Jan 16, 2024

Safety Gym Safety Gym

Quick editorial signal Snelle redactionele duiding

Benchmark

Open problems

Footnotes

Authors

Acknowledgments

Related articles

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

Safety Gym Safety Gym

Quick editorial signal Snelle redactionele duiding

Benchmark

Open problems

Footnotes

Authors

Acknowledgments

Related articles

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.