Abstract

Some recent studies have demonstrated that the deep neural network (DNN) is vulnerable to adversarial examples, which contain some subtle and human-imperceptible perturbations. Although numerous countermeasures have been proposed and play a significant role, most of them all have some flaws and are only effective for certain types of adversarial examples. In the paper, we present a novel and universal countermeasure to recover multiple types of adversarial examples to benign examples before they are fed into the deep neural network. The idea is to model the mapping between adversarial examples and benign examples using a generative adversarial network (GAN). Its GAN architecture consists of a generator based on UNET, a discriminator based on ACGAN, and a newly added third-party classifier. The UNET can enhance the capacity of the generator to recover adversarial examples to benign examples. The loss function makes full use of the advantages of ACGAN and WGAN-GP to ensure the stability of the training process and accelerate its convergence. Besides, a classification loss and a perceptual loss, all from the third-party classifier, are employed to improve further the generator's capacity to eliminate adversarial perturbations. Experiments are conducted on the MNIST, CIFAR10, and IMAGENET datasets. First, we perform ablation experiments to prove the proposed countermeasure's validity. Then, we defend against seven types of state-of-the-art adversarial examples on four deep neural networks and compare them with six existing countermeasures. Finally, the experimental results demonstrate that the proposed countermeasure is universal and has a more excellent performance than other countermeasures. The experimental code is available at https://github.com/Afreadyang/IAED-GAN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call