Abstract

Recent researches show that deep learning model is susceptible to backdoor attacks. Many defenses against backdoor attacks have been proposed. However, existing defense works require high computational overhead or backdoor attack information such as the trigger size. In this paper, we propose a novel backdoor detection method based on intentional adversarial perturbations. The proposed method leverages intentional adversarial perturbation to detect whether an image contains a trigger, which can be applied in both the training stage and the inference stage (sanitize the training set in training stage or detect the backdoor instances in inference stage). Specifically, given an untrusted image, the adversarial perturbation is added to the image intentionally. If the prediction of the model on the perturbed image is consistent with that on the unperturbed image, the input image will be considered as a backdoor instance. Compared with most existing defense works, the proposed method is faster and introduces less computational overhead during backdoor detection process. Moreover, the proposed method maintains the visual quality of the image (as the ℓ2 norm of the added perturbation is as low as 2.8715, 3.0513 and 2.4362 on Fashion-MNIST, CIFAR-10 and GTSRB datasets, respectively). Experimental results show that, for general backdoor attack, the backdoor detection rate of the proposed defense method is 99.63%, 99.76% and 99.91% on Fashion-MNIST, CIFAR-10 and GTSRB datasets, respectively. For invisible backdoor attack, the backdoor detection rate of the proposed defense method is 99.75% against blended backdoor attack and 98.00% against sample-specific backdoor attack. It is also demonstrated that the proposed method can achieve high defense performance against backdoor attacks under different attack settings (trigger transparency, trigger size and trigger pattern). In addition, the experimental comparison with related work demonstrates that the proposed method has better detection performance and higher detection efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call