Abstract

We investigate the transferability of adversarial attacks against deep neural networks (DNNs)—the contagion effect of adversarial attacks that, once deceiving one DNN model, can easily deceive other DNN models built on similar data. We demonstrate that introducing randomness to DNN models can break the curse of the transferability of adversarial attacks, given that the adversary does not have an unlimited attack budget. Two randomization schemes are explored: 1.) a random selection—single or ensemble—from a set of DNNs is surprisingly more robust against the strongest form of complete-knowledge attacks (a.k.a, white box attacks); 2.) after a small Gaussian random noise is added to its learned weights, a DNN model can potentially increase its resilience to adversarial attacks by as much as 74.2%. We compare the two randomization techniques to the Ensemble Adversarial Training technique and show that our randomization techniques are superior under different attack budget constraints. Furthermore, we explore the relationship between attack severity and decision boundary robustness in the version space. Finally, we connect the dots between the effectiveness of randomization to prevent attack transferability and the variability of DNN models through analyzing the differential entropy of sample hypotheses in the hypothesis space.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call