Layer-wise Adversarial Training Approach to Improve Adversarial Robustness

Xiaoyi Chen,Ni Zhang

doi:10.1109/ijcnn48605.2020.9206760

Abstract

Deep neural networks (DNNs) have delivered state-of-the-art performance in many challenging tasks, such as in computer vision, but they are vulnerable to adversarial attacks. Adversarial training is a technique for augmenting training data with adversarial examples and has empirically proven to be the most effective method of defense against adversarial attacks. Motivated by the fact that intermediate layers play a highly important role in maintaining a robust model, we propose to extend conventional adversarial training, which is designed to only manipulate input, such that it becomes layer-wise training. Distinct from previous studies in which robust DNN models were trained in a layer-wise manner, the layer perturbation introduced by our method theoretically proves to be equivalent to the adversarial manipulation of network inputs. This approach guarantees an improvement in the adversarial robustness of DNN models to which the method is applied. We empirically evaluated both shallow and deep CNN models, such as VGG16 and WideResNet28-10, by using MNIST, CIFAR-10, and CIFAR-100 datasets. The results consistently showed that the proposed layer-wise adversarial training approach significantly outperforms conventional adversarial training and that it offers defense against all mainstream attacks including FGSM, IFGSM, PGD, EoT, and C&W. Combining the layer-wise training regime with conventional adversarial training would make it possible to achieve excellent defense performance.

Full Text