Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks

Yubin Qu,Song Huang,Xiang Chen,Xingya Wang,Yongming Yao

doi:10.1016/j.jss.2023.111859

Abstract

Backdoor attacks on deep neural networks (DNNs) using targeted universal adversarial perturbations (TUAPs) do not require training datasets and model tampering, and triggers based on TUAPs can make DNNs output any class the adversary wants. Retraining DNNs using adversarial training for security is time-consuming and does not apply to DNNs in runtime. We want to detect backdoors using a black-box testing approach. We observe that after superimposing random noise on the input of a backdoor attack, the output still tends to remain the same, so we propose Sequential Analysis method based on the Metamorphosis Testing (SAMT). We designed two metamorphic relations for test case generation. Using sequential sampling, we calculate the label stability rate (LSR) and infer whether the image to be verified contains a trigger based on the sequential probability ratio change. The experimental results show that our method has a higher backdoor detection success rate (dsr) than the state-of-the-art detection algorithms. Moreover, our method does not need to use the model structure of DNNs, which has more adaptability and generalization ability. Based on our proposed method, we can simply add a backdoor detection layer to detect backdoors as early as possible, which can eventually alleviate the harm of such backdoors.

Full Text