Adversarial example denoising and detection based on the consistency between Fourier-transformed layers

Seunghwan Jung,Heeyeon Kim,Minyoung Chung,Yeong-Gil Shin

doi:10.1016/j.neucom.2024.128351

Abstract

Deep neural networks (DNNs) have achieved considerable success in a variety of tasks, and this success has led to an increase in the importance of the security and robustness of algorithms where DNNs are applied. In fact, previous studies have identified that neural networks are vulnerable to adversarial examples that are generated by adding a small amount of noise to the input images. Detection-based methods that can detect adversarial examples among input images are a popular defense technique against adversarial examples. However, these methods reject the detected adversarial images, making it difficult to use the input image for further analysis. In this study, we propose a novel denoising-based detection method that simultaneously denoises and detects an input image when defending against adversarial examples. Our method primarily analyzes images in the feature maps of the target classifier layers based on the consistency between the Fourier domains. We first generate adversarial examples and train the denoising network by minimizing the similarity between the non-adversarial and denoised adversarial images in the feature maps of the Fourier-transformed layers. The predicted denoised images are then input to the adversarial example detector, which determines whether the images are adversarial by exploring the characteristics of the concatenated input and denoised images. There are three main contributions in this paper: (1) We propose a novel denoising-based detection method that simultaneously denoises and detects an input image when defending against adversarial examples. (2) We improve the denoising method for adversarial examples that reconstructs adversarial examples based on the Fourier-transformed feature maps. (3) We improve the detection method by using denoised images in addition to the input images. We tested our proposed method against the fast gradient sign method, basic iterative method, projected gradient descent, Deepfool, and Carlini and Wagner adversarial attack algorithms on two Canadian Institute for Advanced Research (CIFAR) datasets, i.e., CIFAR-10 and CIFAR-100. Our method, which employed a frequency-based denoising mechanism in combination with a detection mechanism, demonstrated significant accuracy improvements in the denoising and detection performances when compared to other state-of-the-art methods.

Full Text