Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

Listen to article

Abstract

Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Security Lab at the University of California brought together a multistakeholder group to think through the tools and strategies to mitigate the potential risks introduced by foundation models to international security. Originating in the Cold War, confidence-building measures (CBMs) are actions that reduce hostility, prevent conflict escalation, and improve trust between parties. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape. Participants identified the following CBMs that directly apply to foundation models and which are further explained in this conference proceedings: 1. crisis hotlines 2. incident sharing 3. model, transparency, and system cards 4. content provenance and watermarks 5. collaborative red teaming and table-top exercises and 6. dataset and evaluation sharing. Because most foundation model developers are non-government entities, many CBMs will need to involve a wider stakeholder community. These measures can be implemented either by AI labs or by relevant government actors.

* Ethics & Safety

* Reasonings & Policy

* Community & Collaboration

Report authors, in order of contribution

Sarah Shoker (OpenAI)*, Andrew Reddie (University of California, Berkeley)

Report authors, in alphabetical order

Sarah Barrington (University of California, Berkeley)

Ruby Booth (Berkeley Risk and Security Lab)

Miles Brundage (OpenAI)

Husanjot Chahal (OpenAI)

Michael Depp (Center for a New American Security)

Bill Drexel (Center for a New American Security)

Ritwik Gupta (University of California, Berkeley)

Marina Favaro (Anthropic)

Jake Hecla (University of California, Berkeley)

Alan Hickey (OpenAI)

Margarita Konaev (Center for Security and Emerging Technology)

Kirthi Kumar (University of California, Berkeley)

Nathan Lambert (Hugging Face)

Andrew Lohn (Center for Security and Emerging Technology)

Cullen O'Keefe (OpenAI)

Nazneen Rajani (Hugging Face)

Michael Sellitto (Anthropic)

Robert Trager (Centre for the Governance of AI)

Leah Walker (University of California, Berkeley)

Alexa Wehsener (Institute for Security and Technology)

Jessica Young (Microsoft)

All authors provided substantive contributions to the paper through sharing their ideas as participants in the workshop, writing the paper, and/or editorial feedback and direction. The first two authors are listed in order of contribution, and the remaining authors are listed alphabetically. Some workshop participants have chosen to remain anonymous. The claims in this paper do not represent the views of any author’s organization. For questions about this paper, contact Sarah Shoker at sshoker@openai.com⁠ and Andrew Reddie at areddie@berkeley.edu⁠.

*Significant contribution, including writing, providing detailed input for the paper, research, workshop organization, and setting the direction of the paper.

Significant contribution, including providing detailed input for the paper, research, workshop organization, and setting the direction of the paper.

View all

Disrupting malicious uses of AI by state-affiliated threat actors Security Feb 14, 2024

Building an early warning system for LLM-aided biological threat creation Publication Jan 31, 2024

Democratic inputs to AI grant program: lessons learned and implementation plans Safety Jan 16, 2024

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

Quick editorial signal Snelle redactionele duiding

Abstract

Report authors, in order of contribution

Report authors, in alphabetical order

Related articles

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

The next phase of the Microsoft OpenAI partnership The next phase of the Microsoft OpenAI partnership

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

Quick editorial signal Snelle redactionele duiding

Abstract

Report authors, in order of contribution

Report authors, in alphabetical order

Related articles

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

The next phase of the Microsoft OpenAI partnership The next phase of the Microsoft OpenAI partnership

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.