Abstract

Although adversarial examples pose a serious threat to deep neural networks, most transferable adversarial attacks are ineffective against black-box defense models. This may lead to the mistaken belief that adversarial examples are not truly threatening. In this paper, we propose a novel transferable attack that can defeat a wide range of black-box defenses and highlight their security limitations. We identify two intrinsic reasons why current attacks may fail, namely data-dependency and network-overfitting. They provide a different perspective on improving the transferability of attacks. To mitigate the data-dependency effect, we propose the Data Erosion method. It involves finding special augmentation data that behave similarly in both vanilla models and defenses, to help attackers fool robustified models with higher chances. In addition, we introduce the Network Erosion method to overcome the network-overfitting dilemma. The idea is conceptually simple: it extends a single surrogate model to an ensemble structure with high diversity, resulting in more transferable adversarial examples. Two proposed methods can be integrated to further enhance the transferability, referred to as Erosion Attack (EA). We evaluate the proposed EA under different defenses that empirical results demonstrate the superiority of EA over existing transferable attacks and reveal the underlying threat to current robust models. The source code is publicly available at https://github.com/mesunhlf/EA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call