Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Chieh-Ming Jiang ,Ching-Chun Huang ,Wei-Chen Chiu ,Pin-Yu Chen ,Zhi-Yi Chin

doi:10.48550/arxiv.2309.06135

Abstract

Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i.e. not safe for work) images. Although efforts have been made to filter inappropriate images/prompts or remove undesirable concepts/styles via model fine-tuning, the reliability of these safety mechanisms against diversified problematic prompts remains largely unexplored. In this work, we propose Prompting4Debugging (P4D) as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. We demonstrate the efficacy of our P4D tool in uncovering new vulnerabilities of SD models with safety mechanisms. Particularly, our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms, including concept removal, negative prompt, and safety guidance. Our findings suggest that, without comprehensive testing, the evaluations on limited safe prompting benchmarks can lead to a false sense of safety for text-to-image models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors
Shiyin Dong ... Kun Cheng
-
Shiyin Dong, et. al.Shiyin Dong ... Kun Cheng
01 Aug 2024
01 Aug 2024

Achieving High-Quality Text and Audio-to-Image Generation in a Single Step
Satish Karanjekar
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08
Satish KaranjekarSatish Karanjekar
29 Mar 2024
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08

Photonic modes prediction via multi-modal diffusion model
Jinyang Sun ... Xingping Zhou
Machine Learning: Science and Technology | VOL. 5
Jinyang Sun, et. al.Jinyang Sun ... Xingping Zhou
01 Sep 2024
Machine Learning: Science and Technology | VOL. 5

Analysis for design optimization of high thrust liquid engine hot test facility
Abhishek Sharma ... S Sunil Kumar
Acta Astronautica | VOL. 193
Abhishek Sharma, et. al.Abhishek Sharma ... S Sunil Kumar
08 Aug 2021
Acta Astronautica | VOL. 193

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Abstract

Talk to us

Similar Papers