Abstract

Adversarial attacks have threatened the application of deep neural networks in security-sensitive scenarios. Most existing black-box attacks fool the target model by interacting with it many times and producing global perturbations. However, all pixels are not equally crucial to the target model; thus, indiscriminately treating all pixels will increase query overhead inevitably. In addition, existing black-box attacks take clean samples as start points, which also limits query efficiency. In this article, we propose a novel black-box attack framework, constructed on a strategy of dual transferability (DT), to perturb the discriminative areas of clean examples within limited queries. The first kind of transferability is the transferability of model interpretations. Based on this property, we identify the discriminative areas of clean samples for generating local perturbations. The second is the transferability of adversarial examples, which helps us to produce local pre-perturbations for further improving query efficiency. We achieve the two kinds of transferability through an independent auxiliary model and do not incur extra query overhead. After identifying discriminative areas and generating pre-perturbations, we use the pre-perturbed samples as better start points and further perturb them locally in a black-box manner to search the corresponding adversarial examples. The DT strategy is general; thus, the proposed framework can be applied to different types of black-box attacks. We conduct extensive experiments to show that, under various system settings, our framework can significantly improve the query efficiency of existing black-box attacks and attack success rates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call