High-quality datasets are essential for training high-performance models, while the process of collection, cleaning, and labeling is costly. As a result, datasets are considered valuable intellectual property. However, when security mechanisms are symmetry-breaking, creating exploitable vulnerabilities, unauthorized use or data leakage can infringe on the copyright of dataset owners. In this study, we design a method to mount clean-label dataset watermarking based on trigger optimization, aiming to protect the copyright of the dataset from infringement. We first perform iterative optimization of the trigger based on a surrogate model, with targets class samples guiding the updates. The process ensures that the optimized triggers contain robust feature representations of the watermark target class. A watermarked dataset is obtained by embedding optimized triggers into randomly selected samples from the watermark target class. If an adversary trains a model with the watermarked dataset, our watermark will manipulate the model’s output. By observing the output of the suspect model on samples with triggers, it can be determined whether the model was trained on the watermarked dataset. The experimental results demonstrate that the proposed method exhibits high imperceptibility and strong robustness against pruning and fine-tuning attacks. Compared to existing methods, the proposed method significantly improves effectiveness at very low watermarking rates.
Read full abstract