Saliency Map-Based Local White-Box Adversarial Attack Against Deep Neural Networks

Haohan Liu,Xingquan Zuo,Xing Wan,Hai Huang

doi:10.1007/978-3-031-20500-2_1

Abstract

The current deep neural networks (DNN) are easily fooled by adversarial examples, which are generated by adding some small, well-designed and human-imperceptible perturbations to clean examples. Adversarial examples will mislead deep learning (DL) model to make wrong predictions. At present, many existing white-box attack methods in the image field are mainly based on the global gradient of the model. That is, the global gradient is first calculated, and then the perturbation is added into the gradient direction. Those methods usually have a high attack success rate. However, there are also some shortcomings, such as excessive perturbation and easy detection by the human’s eye. Therefore, in this paper we propose a Saliency Map-based Local white-box Adversarial Attack method (SMLAA). The saliency map used in the interpretability of artificial intelligence is introduced in SMLAA. First, Gradient-weighted Class Activation Mapping (Grad-CAM) is utilized to provide a visual interpretation of model decisions to find important areas in an image. Then, the perturbation is added only to important local areas to reduce the magnitude of perturbations. Experimental results show that compared with the global attack method, SMLAA reduces the average robustness measure by 9%–24% while ensuring the attack success rate. It means that SMLAA has a high attack success rate with fewer pixels changed.

Full Text