Defense against adversarial attack by feature distillation and metric learning

Xiang Xiang,Pengfei Zhang,Yi Xu,Xiaoming Ju

doi:10.1117/12.2645603

Xiang Xiang, Pengfei Zhang + Show 2 more

https://doi.org/10.1117/12.2645603

Copy DOI

Export

Save

Cite

Publication Date: Aug 23, 2022

Affiliation: East China Normal University

Abstract
Full-Text
Similar Papers

Abstract

Listen

In recent years, deep neural networks have achieved high accuracy in many classification tasks, including speech recognition, object detection, and image classification. Although the deep neural network is robust to random noise, when some special disturbances that cannot be detected by the human eye are added to the neural network input, these special disturbances will still cause the deep neural network model to output wrong predictions. For the defense method against adversarial samples, we propose an adversarial training method based on the combination of feature distillation and metric learning. This method is to pretrain a fixed teacher network training method and use clean sample training. The student network uses adversarial samples for adversarial training. During the training process, the clean samples are used in the middle layer features of the teacher network to guide the adversarial samples in the middle layer features of the student network, and the middle layer features of the adversarial samples are repaired in the student network to achieve good results. At the same time, considering the relationship between adversarial samples and clean samples in the student network, a metric learning loss is introduced in<sup>1</sup> the middle layer features of the student network, so that the distance between the adversarial samples and the clean samples is closer than that between the adversarial samples and the confused samples. This makes the deep neural network model more robust. Finally, we perform gray-box, white-box and black-box attacks to verify the effectiveness of our method. Our algorithm significantly outperforms state-of-the-art adversarial training algorithms.

Full Text