Capable, Open, and Safe: Combating AI Misuse

Black Forest Labs is pushing the frontier of visual intelligence. Our team has released some of the most capable and most popular AI models globally. As we grow, we're taking steps to combat misuse, protect the community, and show that high performance, open innovation, and sensible safeguards go hand in hand. Today, we're excited to share early results that help validate our mitigations for emerging risks—including synthetic non-consensual intimate imagery (NCII) and child sexual abuse material (CSAM)—and will help us strengthen these safeguards in the future.

On third-party evaluations, our latest FLUX.2 model family demonstrates >10 times fewer vulnerabilities for serious risks than other leading open-weight image models, including those from large technology firms.

Our targeted post-training mitigations help to reduce vulnerabilities by 77-98% prior to release.

Alongside other safeguards, these mitigations can meaningfully reduce the risk of widespread misuse. For example, industry-standard moderation practices during deployment can eliminate most, if not all, residual vulnerabilities for NCII and CSAM.

On third-party evaluations, our latest FLUX.2 model family demonstrates >10 times fewer vulnerabilities for serious risks than other leading open-weight image models, including those from large technology firms.

Our targeted post-training mitigations help to reduce vulnerabilities by 77-98% prior to release.

Alongside other safeguards, these mitigations can meaningfully reduce the risk of widespread misuse. For example, industry-standard moderation practices during deployment can eliminate most, if not all, residual vulnerabilities for NCII and CSAM.

Black Forest Labs is committed to open innovation

Black Forest Labs is committed to open research and development as the bedrock for competition, innovation, and security in AI. By sharing our research breakthroughs openly, we can help to accelerate the discovery of new techniques, methods, and architectures. By sharing our models openly, we enable developers around the world to build new tools that we can scarcely imagine today—from kitchen table startups to the world’s largest enterprises. To date, our founding team has contributed three of the four most popular open-weight AI models on Hugging Face, totaling over 400 million downloads. Today, our FLUX family leads the most capable open-weight image models.

We face unique challenges

Pre-training. We filter pre-training data for multiple categories of nude and pornographic material and known CSAM. Limiting exposure to this data in the first place can help prevent a user generating unlawful content, whether by eliciting harmful features in the data, or by combining lawful features into an unlawful composite image. We have partnered with the Internet Watch Foundation, an independent nonprofit organization dedicated to preventing online abuse, to filter known CSAM from the training data.

Post-training. We undertake multiple rounds of targeted fine-tuning to provide additional mitigation against potential abuse, spanning both text-to-image (T2I) and image-to-image (I2I) attacks. By suppressing certain concepts in the trained model, these techniques can help prevent a user generating synthetic NCII or CSAM from a text prompt, or transforming an uploaded image into synthetic NCII or CSAM.

Deployment. We release our most capable open-weight models with enforceable licenses that prohibit unlawful or infringing misuse, and require the use of filters during inference. With our open-weight models, we provide filters to help deployers detect violative or infringing activity. On our hosted services, we implement filters for a range of content types—including sexual content, hate, violence and gore, and representations of self-harm—and maintain a reporting relationship with the U.S. National Center for Missing and Exploited Children. Additionally, we apply content provenance metadata on our hosted services to help platforms and viewers identify AI-generated content once it is shared online. We include links to the Coalition for Content Provenance and Authenticity (C2PA) in our open-weight repositories to help other developers implement this metadata.

After deployment. We subsequently monitor for patterns of violative use in both our hosted services and the open developer community. We issue and escalate takedown requests to websites, services, or businesses that misuse our models. Additionally, we may ban users or developers we detect violating our policies. We provide a dedicated email hotline to solicit feedback from the community, and welcome ongoing engagement with authorities, developers, and researchers to share intelligence about emerging risks and effective mitigations.

We take our responsibility to mitigate emerging risks seriously. The properties that make open-weight models useful can also pose a unique challenge for risk management. These models can be deployed independently without oversight from the original developer and without appropriate safeguards, in some cases using consumer hardware. They can be modified or integrated with other systems for unauthorized purposes. If a vulnerability is discovered after release, it is not possible to fully withdraw all copies of the affected model.

We respond with layers of mitigation

While there are no silver bullets to prevent all misuse, layers of mitigation can help to prevent widespread misuse. Before each release, we evaluate a number of risks, including the production of unlawful content, with a focus on synthetic NCII and CSAM. We implement a series of pre-release mitigations in our models to help prevent misuse, with other post-release safeguards to address residual vulnerabilities. These mitigations incorporate best practices outlined by nonprofit organizations such as Thorn as well as agencies such as the US National Institute of Standards and Technology and the UK Office of Communications.

Directly attempt to elicit violative content;

Obscure a violative request in an otherwise benign context;

Obfuscate intent through indirect language, “l33t”, or scrambling;

Attempt to construct a violative image by assembling features that are individually nonviolative; and

Request analogous visual features or substitutes in place of violative terms.

Undress, simulate, or reimagine an otherwise clothed individual;

De-age an individual;

Splice together multiple individuals to produce a violative composite figure; and

Merge multiple images into a violative scene.

Deployment. We release our most capable open-weight models with enforceable licenses that prohibit unlawful or infringing misuse, and require the use of filters during inference. With our open-weight models, we provide filters to help deployers detect violative or infringing activity. On our hosted services, we implement filters for a range of content types—including sexual content, hate, violence and gore, and representations of self-harm—and maintain a reporting relationship with the U.S. National Center for Missing and Exploited Children. Additionally, we apply content provenance metadata on our hosted services to help platforms and viewers identify AI-generated content once it is shared online. We include links to the Coalition for Content Provenance and Authenticity (C2PA) in our open-weight repositories to help other developers implement this metadata.

Comparative risk. Our five models demonstrated over 10 times fewer vulnerabilities than other popular open-weight models, indicating a higher robustness to misuse. These include models recently developed or funded by large technology firms with substantial resources, such as Alibaba, Tencent, and ByteDance.

Progressive mitigation. Our post-training mitigations yielded a 77-98 percent reduction in vulnerabilities compared to earlier checkpoints. Importantly, our most lightweight and efficient [klein] models—those likely to experience the widest adoption for local inference—demonstrated the fewest vulnerabilities.

Residual vulnerabilities. Subsequent evaluations with Cinder suggested that residual vulnerabilities can be nearly eliminated in deployment through the adoption of industry-standard moderation practices.

Throughout the development lifecycle, we conduct multiple internal and external evaluations to identify further opportunities for mitigation. For our latest open-weight model family, FLUX.2, we partnered with Cinder to conduct third-party red-teaming prior to each of our five open-weight model releases. These included FLUX.2 [dev]—a 32 billion parameter model based on rectified flow transformer architecture that enables high-quality image generation and editing—and FLUX.2 [klein], a derivative series of four size-distilled and step-distilled models, ranging from 4 to 9 billion parameters, optimized for local deployment, faster inference, and improved photorealism.

Third-party testing informed our release decision

Prior to release, we tasked Cinder to evaluate these models throughout their development lifecycle, including early, intermediate, and final checkpoints. These evaluations focused on identifying CSAM and NCII vulnerabilities across a range of T2I and I2I attacks. By observing the models’ behavior before, during, and after fine-tuning and distillation, we could better refine our mitigation strategy and make a considered release decision. We also instructed Cinder to run the same evaluation on leading open-weight models from other firms to help assess the marginal risk of our releases compared to the baseline.Attacks included prompts that:

Directly attempt to elicit violative content;

Obscure a violative request in an otherwise benign context;

Obfuscate intent through indirect language, “l33t”, or scrambling;

Attempt to construct a violative image by assembling features that are individually nonviolative; and

Capable, Open, and Safe: Combating AI Misuse Capable, Open, and Safe: Combating AI Misuse

On third-party evaluations, our latest FLUX.2 model family demonstrates >10 times fewer vulnerabilities for serious risks than other leading open-weight image models, including those from large technology firms.

Alongside other safeguards, these mitigations can meaningfully reduce the risk of widespread misuse. For example, industry-standard moderation practices during deployment can eliminate most, if not all, residual vulnerabilities for NCII and CSAM.

We respond with layers of mitigation

Obscure a violative request in an otherwise benign context;

More from FLUX Meer van FLUX

FLUX.2 [klein]: Towards Interactive Visual Intelligence FLUX.2 [klein]: Towards Interactive Visual Intelligence

Laying the Foundations for Visual Intelligence—Our $300M Series B Laying the Foundations for Visual Intelligence—Our $300M Series B

FLUX.2: Frontier Visual Intelligence FLUX.2: Frontier Visual Intelligence

FLUX.1 Kontext now in Adobe Photoshop: Powering Every Pixel FLUX.1 Kontext now in Adobe Photoshop: Powering Every Pixel

Capable, Open, and Safe: Combating AI Misuse Capable, Open, and Safe: Combating AI Misuse

On third-party evaluations, our latest FLUX.2 model family demonstrates >10 times fewer vulnerabilities for serious risks than other leading open-weight image models, including those from large technology firms.

Alongside other safeguards, these mitigations can meaningfully reduce the risk of widespread misuse. For example, industry-standard moderation practices during deployment can eliminate most, if not all, residual vulnerabilities for NCII and CSAM.

We respond with layers of mitigation

Obscure a violative request in an otherwise benign context;

More from FLUX Meer van FLUX

FLUX.2 [klein]: Towards Interactive Visual Intelligence FLUX.2 [klein]: Towards Interactive Visual Intelligence

Laying the Foundations for Visual Intelligence—Our $300M Series B Laying the Foundations for Visual Intelligence—Our $300M Series B

FLUX.2: Frontier Visual Intelligence FLUX.2: Frontier Visual Intelligence

FLUX.1 Kontext now in Adobe Photoshop: Powering Every Pixel FLUX.1 Kontext now in Adobe Photoshop: Powering Every Pixel

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.