The unprecedented success of deep learning could not be achieved without the synergy of big data, computing power, and human knowledge, among which none is free. This calls for the copyright protection of deep neural networks (DNNs), which has been tackled via DNN watermarking. Due to the special structure of DNNs, backdoor watermarks have been one of the popular solutions. In this article, we first present a big picture of DNN watermarking scenarios with rigorous definitions unifying the black- and white-box concepts across watermark embedding, attack, and verification phases. Then, from the perspective of data diversity, especially adversarial and open set examples overlooked in the existing works, we rigorously reveal the vulnerability of backdoor watermarks against black-box ambiguity attacks. To solve this problem, we propose an unambiguous backdoor watermarking scheme via the design of deterministically dependent trigger samples and labels, showing that the cost of ambiguity attacks will increase from the existing linear complexity to exponential complexity. Furthermore, noting that the existing definition of backdoor fidelity is solely concerned with classification accuracy, we propose to more rigorously evaluate fidelity via examining training data feature distributions and decision boundaries before and after backdoor embedding. Incorporating the proposed prototype guided regularizer (PGR) and fine-tune all layers (FTAL) strategy, we show that backdoor fidelity can be substantially improved. Experimental results using two versions of the basic ResNet18, advanced wide residual network (WRN28_10) and EfficientNet-B0, on MNIST, CIFAR-10, CIFAR-100, and FOOD-101 classification tasks, respectively, illustrate the advantages of the proposed method.
Read full abstract