Cyberweerbaarheid versterken naarmate AI-capaciteiten toenemen

Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyberdefense as well as new dual-use risks that must be managed carefully. For example, capabilities assessed through capture-the-flag (CTF) challenges have improved from 27% on GPT‑5⁠(opens in a new window) in August 2025 to 76% on GPT‑5.1‑Codex‑Max⁠(opens in a new window) in November 2025.

We expect that upcoming AI models will continue on this trajectory; in preparation, we are planning and evaluating as though each new model could reach ‘High’ levels of cybersecurity capability, as measured by our Preparedness Framework⁠(opens in a new window). By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects. This post explains how we think about safeguards for models that reach these levels of capability, and ensure they meaningfully help defenders while limiting misuse.

As these capabilities advance, OpenAI is investing in strengthening our models for defensive cybersecurity tasks and creating tools that enable defenders to more easily perform workflows such as auditing code and patching vulnerabilities. Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced.

Like other dual-use domains, defensive and offensive cyber workflows often rely on the same underlying knowledge and techniques. We are investing in safeguards to help ensure these powerful capabilities primarily benefit defensive uses and limit uplift for malicious purposes. Cybersecurity touches almost every field, which means we cannot rely on any single category of safeguards—such as restricting knowledge or using vetted access alone—but instead need a defense-in-depth approach that balances risk and empowers users. In practice, this means shaping how capabilities are accessed, guided, and applied so that advanced models strengthen security rather than lower barriers to misuse.

We see this work not as a one-time effort, but as a sustained, long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem.

Mitigating malicious uses

Our models are designed and trained to operate safely, supported by proactive systems that detect and respond to cyber abuse. We continuously refine these protections as our capabilities and the threat landscape change. While no system can guarantee complete prevention of misuse in cybersecurity without severely impacting defensive uses, our strategy is to mitigate risk through a layered safety stack.

At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring. We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly. These safeguards are designed to evolve with the threat landscape. We assume change, and we build so we can adjust quickly and appropriately.

Building on this foundation:

* Training the model to refuse or safely respond to harmful requests while remaining helpful for educational and defensive use cases:We are training our frontier models to refuse or safely respond to requests that would enable clear cyber abuse, while remaining maximally helpful for legitimate defensive and educational use cases.

* Detection systems:We refine and maintain system-wide monitoring across products that use frontier models to detect potentially malicious cyber activity. When activity appears unsafe, we may block output, route prompts to safer or less capable models, or escalate for enforcement. Our enforcement combines automated and human review, informed by factors like legal requirements, severity, and repeat behavior. We also work closely with developers and enterprise customers to align on safety standards and enable responsible use with clear escalation paths.

Ecosystem initiatives to strengthen cyber resilience

OpenAI has invested early in applying AI to defensive cybersecurity use cases and our team closely coordinates with global experts to mature both our models and their application. We value the global community of cybersecurity practitioners toiling to make our digital world safer and are committed to delivering powerful tools that support defensive security. As we roll out new safeguards, we will continue to work with the cybersecurity community to understand where AI can meaningfully strengthen resilience, and where thoughtful safeguards are most important.

Alongside these collaborations, we are establishing a set of efforts designed to help defenders move faster, ground our safeguards in real-world needs, and accelerate responsible remediation at scale.

Trusted access programs for cyberdefense

We will soon introduce a trusted access program where we explore providing qualifying users and customers working on cyberdefense with tiered access to enhanced capabilities in our latest models for defensive use cases. We're still exploring the right boundary of which capabilities we can provide broad access to and which ones require tiered restrictions, which may influence the future design of this program. We aim for this trusted access program to be a building block towards a resilient ecosystem.

Expanding defensive capacity with Aardvark

Aardvark, our agentic security researcher that helps developers and security teams find and fix vulnerabilities at scale, is now in private beta. It scans codebases for vulnerabilities and proposes patches that maintainers can adopt quickly. It has already identified novel CVEs in open-source software by reasoning over entire codebases. We plan to offer free coverage to select non-commercial open source repositories to contribute to the security of the open source software ecosystem and supply chain. Apply to participate here.

Frontier Risk Council

We will be establishing the Frontier Risk Council, an advisory group that will bring experienced cyber defenders and security practitioners into close collaboration with our teams. This council will start with a focus on cybersecurity, and expand into other frontier capability domains in the future. Members will advise on the boundary between useful, responsible capability and potential misuse, and these learnings will directly inform our evaluations and safeguards. We will share more on the council soon.

Developing a shared understanding on threat models with the industry

Finally, we anticipate cyber misuse may be viable from any frontier model in the industry. To address this, we work with other frontier labs through the Frontier Model Forum, a nonprofit backed by leading AI labs and industry partners, to develop a shared understanding of threat models and best practices. In this context, threat modeling helps mitigate risk by identifying how AI capabilities could be weaponized, where critical bottlenecks exist for different threat actors, and how frontier models might provide meaningful uplift. This collaboration aims to build a consistent, ecosystem-wide understanding of threat actors and attack pathways, enabling labs, maintainers, and defenders to better improve their mitigations and ensure critical security insights propagate quickly across the ecosystem. We are also engaging with external teams to develop cybersecurity evaluations⁠(opens in a new window). We hope an ecosystem of independent evaluations will further help build a shared understanding of model capabilities.

Together, these efforts reflect our long-term commitment to strengthening the defensive side of the ecosystem. As models become more capable, our goal is to help ensure those capabilities translate into real leverage for defenders—grounded in real-world needs, shaped by expert input, and deployed with care. Alongside this work, we plan to explore other initiatives and cyber security grants to help surface breakthrough ideas that may not emerge from traditional pipelines, and to crowdsource bold, creative defenses from across academia, industry, and the open-source community. Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security.

* 2025

Author

Strengthening cyber resilience as AI capabilities advance Cyberweerbaarheid versterken naarmate AI-capaciteiten toenemen

Quick editorial signal Snelle redactionele duiding

OpenAI

Help shape what we cover next Help bepalen wat we hierna volgen

More from OpenAI Meer van OpenAI

Our principles Our principles

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

GPT-5.5 Bio Bug Bounty GPT-5.5 Bio Bug Bounty

How to get started with Codex Zo begin je met Codex