In recent years, the stable diffusion models (SDMs) have been widely used in text-to-image generative tasks, and their copyright protection problem has been concerned by scholars. The model owners can embed watermarks into SDMs by fine-tuning them, and use the prompt-watermark pair to complete model ownership authentication. However, the attackers can obfuscate model ownership by forging the relationship between the fake prompt and the watermark image. Therefore, this paper proposes a black-box copyright protection method for SDMs, which can effectively resist watermark ambiguity attacks. Specifically, we adopt an irreversible watermarking technology to complete watermark embedding. The hash function is used to ensure the unidirectional irreversible generation of the trigger prompts using the secret key. Then, the trigger set consisting of trigger prompts and watermarks is used to fine-tune the SDMs to embed the watermarks. Without the secret key, it is not possible for the attackers to reverse build the specific prompts with internal associations. Experiments show that our method can protect the copyright of SDMs effectively and resist ambiguity attacks without the model performance degradation.
Read full abstract