Abstract

The vulnerability of Deep Neural Network (DNN) models to maliciously crafted adversarial perturbations is a critical topic considering their ongoing large-scale deployment. In this work, we explore an interesting phenomenon that occurs when an image is reinjected multiple times into a DNN, according to a procedure (called reverberation) that has been first proposed in cognitive psychology to avoid the catastrophic forgetting issue, through its impact on adversarial perturbations. We describe reverberation in vanilla autoencoders and propose a new reverberant architecture combining a classifier and an autoencoder that allows the joint observation of the logits and reconstructed images. We experimentally measure the impact of reverberation on adversarial perturbations placing ourselves in a scenario of adversarial example detection. The results show that clean and adversarial examples – even with small levels of perturbation – behave very differently throughout reverberation. While computationally efficient (reverberation is only based on inferences), our approach yields promising results for adversarial examples detection, consistent across datasets, adversarial attacks and DNN architectures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call