← Back to OpenAI updates ← Terug naar OpenAI-updates

OpenAI ARTICLE ARTIKEL 19 April 2024 19 april 2024

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Read paper(opens in a new window) Read paper(opens in a new window)

Updates Updates Videos Video's

Article details Artikelgegevens

AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 19 April 2024 19 april 2024 Updates Updates Videos Video's View original article Bekijk origineel artikel

Why it matters Waarom dit telt

Quick editorial signal Snelle redactionele duiding

1 min

Impact Impact

Relevant if you build with AI tools, APIs, or coding agents. Relevant als je bouwt met AI-tools, API's of coding agents.

Audience Voor wie Developers Developers

Level Niveau Expert Expert

Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
Useful for builders who need to understand API, coding, or workflow changes. Nuttig voor bouwers die API-, code- of workflowwijzigingen willen begrijpen.
Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.

model apps video developers

Abstract

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts. In this work, we argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from untrusted users and third parties. To address this, we propose an instruction hierarchy that explicitly defines how models should behave when instructions of different priorities conflict. We then propose a data generation method to demonstrate this hierarchical instruction following behavior, which teaches LLMs to selectively ignore lower-privileged instructions. We apply this method to GPT‑3.5, showing that it drastically increases robustness -- even for attack types not seen during training -- while imposing minimal degradations on standard capabilities.

* GPT

* Language

* Reasonings & Policy

* Ethics & Safety

Authors

Eric Wallace⁠(opens in a new window)

Kai Xiao⁠(opens in a new window)

Reimar Leike⁠(opens in a new window)

Lilian Weng⁠(opens in a new window)

Johannes Heidecke⁠(opens in a new window)

Alex Beutel⁠(opens in a new window)

Help shape what we cover next Help bepalen wat we hierna volgen

Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.

Share article Deel artikel

More from OpenAI Meer van OpenAI

All updates Alle updates

27 Apr 2026 27 apr. 2026

OpenAI available at FedRAMP Moderate OpenAI available at FedRAMP Moderate

Title: OpenAI available at FedRAMP Moderate Title: OpenAI available at FedRAMP Moderate

Open article → Open artikel →

27 Apr 2026 27 apr. 2026

Choco automates food distribution with AI agents Choco automates food distribution with AI agents

Using OpenAI APIs, Choco processes millions of orders, reducing manual work and enabling always-on operations across global food supply chains. Using OpenAI APIs, Choco processes millions of orders, reducing manual work and enabling always-on operations across global food supply chains.

Open article → Open artikel →

27 Apr 2026 27 apr. 2026

An open-source spec for Codex orchestration: Symphony. An open-source spec for Codex orchestration: Symphony.

Title: An open-source spec for Codex orchestration: Symphony. Title: An open-source spec for Codex orchestration: Symphony.

Open article → Open artikel →

27 Apr 2026 27 apr. 2026

The next phase of the Microsoft OpenAI partnership The next phase of the Microsoft OpenAI partnership

Amended agreement provides long-term clarity. Amended agreement provides long-term clarity.

Open article → Open artikel →

Gemini komt eraan