Concrete AI safety problems

We (along with researchers from Berkeley and Stanford) are co-authors on today’s paper led by Google Brain researchers,Concrete Problems in AI Safety⁠(opens in a new window). The paper explores many research problems around ensuring that modern machine learning systems operate as intended.

Advancing AI requires making AI systems smarter, but it also requires preventing accidents—that is, ensuring that AI systems do what people actually want them to do. There’s been an increasing focus onsafety research⁠(opens in a new window)from the machine learning community, such as a recentpaper⁠(opens in a new window)fromDeepMind⁠(opens in a new window)andFHI⁠(opens in a new window). Still, many machine learning researchers have wondered just how much safety research can be done today.

The authors discuss five areas:

* Safe exploration._Canreinforcement learning_⁠(opens in a new window)_(RL) agents learn about their environment without executing catastrophic actions?_ For example, can an RL agent learn to navigate an environment without ever falling off a ledge?

* Robustness to distributional shift._Can machine learning systems be robust to changes in the data distribution, or at least fail gracefully?_ For example, can we buildimage classifiers⁠(opens in a new window)that indicate appropriate uncertainty when shown new kinds of images, instead of confidently trying to use itspotentially inapplicable⁠(opens in a new window)learned model?

* Avoiding negative side effects._Can we transform an RL agent’sreward function_⁠(opens in a new window)_to avoid undesired effects on the environment?_ For example, can we build a robot that will move an object while avoiding knocking anything over or breaking anything, without manually programming a separate penalty for each possible bad behavior?

* Avoiding “reward hacking” and “wireheading⁠(opens in a new window)”._Can we prevent agents from “gaming” their reward functions, such as by distorting their observations?_ For example, can we train an RL agent to minimize the number of dirty surfaces in a building, without causing it to avoid looking for dirty surfaces or to create new dirty surfaces to clean up?

Many of the problems are not new, but the paper explores them in the context of cutting-edge systems. We hope they’ll inspire more people to work on AI safety research, whetherat OpenAI⁠or elsewhere.

We’re particularly excited to have participated in this paper as a cross-institutional collaboration. We think that broad AI safety collaborations will enable everyone to build better machine learning systems.Let us know⁠(opens in a new window)if you have a future paper you’d like to collaborate on!

Authors

Paul Christiano, Greg Brockman

View all

Disrupting malicious uses of AI by state-affiliated threat actors Security Feb 14, 2024

Building an early warning system for LLM-aided biological threat creation Publication Jan 31, 2024

Democratic inputs to AI grant program: lessons learned and implementation plans Safety Jan 16, 2024

Concrete AI safety problems Concrete AI safety problems

Paul Christiano, Greg Brockman

View all

More from OpenAI Meer van OpenAI

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

What is Codex? Wat is Codex?

Codex settings Codex-instellingen

Concrete AI safety problems Concrete AI safety problems

Paul Christiano, Greg Brockman

View all

More from OpenAI Meer van OpenAI

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

What is Codex? Wat is Codex?

Codex settings Codex-instellingen

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.