New and improved content moderation tooling

Listen to article

To help developers protect their applications against possible misuse, we are introducing the faster and more accurateModeration endpoint⁠(opens in a new window). This endpoint provides OpenAI API developers with free access toGPT‑based⁠classifiers that detect undesired content—an instance ofusing AI systems⁠to assist with human supervision of these systems. We have also released both atechnical paper⁠(opens in a new window)describing our methodology and thedataset⁠(opens in a new window)used for evaluation.

When given a text input, the Moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm—content prohibited by ourcontent policy⁠(opens in a new window). The endpoint has been trained to be quick, accurate, and to perform robustly across a range of applications. Importantly, this reduces the chances of products “saying” the wrong thing, even when deployed to users at-scale. As a consequence, AI can unlock benefits in sensitive settings, like education, where it could not otherwise be used with confidence.

input text

Violence

Self-harm

Hate

Sexual

Moderation endpoint

Flagged

The Moderation endpoint helps developers to benefit from our infrastructure investments. Rather than build and maintain their own classifiers—an extensive process, as we document in ourpaper⁠(opens in a new window)—they can instead access accurate classifiers through a single API call.

As part of OpenAI’scommitment⁠tomaking the AI ecosystem safer⁠, we are providing this endpoint to allow free moderation of all OpenAI API-generated content. For instance,Inworld⁠(opens in a new window), an OpenAI API customer, uses the Moderation endpoint to help their AI-based virtual characters remain appropriate for their audiences. By leveraging OpenAI’s technology, Inworld can focus on their core product: creating memorable characters. We currently do not support monitoring of third-party traffic.

Get started with the Moderation endpoint by checking outthe documentation⁠(opens in a new window). More details of the training process and model performance are available in ourpaper⁠(opens in a new window). We have also released anevaluation dataset⁠(opens in a new window), featuring Common Crawl data labeled within these categories, which we hope will spur further research in this area.

* View documentation(opens in a new window)

* API Platform

Authors

Todor Markov, Chong Zhang, Sandhini Agarwal, Tyna Eloundou, Teddy Lee, Steven Adler, Angela Jiang, Lilian Weng

View all

Global news partnerships: Le Monde and Prisa Media Company Mar 13, 2024

Review completed & Altman, Brockman to continue to lead OpenAI Company Mar 8, 2024

OpenAI announces new members to board of directors Company Mar 8, 2024

New and improved content moderation tooling New and improved content moderation tooling

Quick editorial signal Snelle redactionele duiding

Todor Markov, Chong Zhang, Sandhini Agarwal, Tyna Eloundou, Teddy Lee, Steven Adler, Angela Jiang, Lilian Weng

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

New and improved content moderation tooling New and improved content moderation tooling

Quick editorial signal Snelle redactionele duiding

Todor Markov, Chong Zhang, Sandhini Agarwal, Tyna Eloundou, Teddy Lee, Steven Adler, Angela Jiang, Lilian Weng

View all

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.