Abstract

The phenomenon that deep neural networks are vulnerable to adversarial examples has been found for several years. Under the black-box setting, transfer-based methods usually produce the adversarial examples on a white-box model, which serves as the surrogate model in the black-box attack, and hope that the same adversarial examples can also fool the black-box model. However, these methods have high success rates for the surrogate model and exhibit weak transferability for the black-box model. In addition, some studies have shown that deep neural networks are also vulnerable to sparse alterations of the input, but existing sparse attacks mainly focus on the number of attacked pixels without restricting the size of the perturbations, which is perceptible to human eyes. To address the above problems, we propose a transfer-based sparse attack method, called adaptive momentum variance based iterative gradient method with a class activation map, where the method considers a simple adaptive momentum variance and a refining perturbation mechanism to improve the transferability of adversarial examples. Also, a class activation map, which is also known as attention mechanism, is employed to explore the relationship between the number of the perturbed pixels and the attack performance in the case of limiting the intensity of perturbation. The proposed method is compared with a number of the state-of-the-art transfer-based adversarial attack methods on the ImageNet dataset, and the empirical results demonstrate that our method achieves a significant increase in transferability with only attacking about 50% of the pixels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call