An adversarial attack detection method in deep neural networks based on re-attacking approach

Morteza Ali Ahmadi,Hossein Amirkhani,Rouhollah Dianat

doi:10.1007/s11042-020-10261-5

Abstract

In this paper, we propose a new method for detecting adversarial attacks on deep neural networks. Our algorithm is based on the intuition that attacking input images results in different displacement vectors for clean and adversarial classes. For example, if the input image is an adversarial example, the re-attacking process results in a displacement vector with a short length in the feature space, but this displacement is considerable for clean images. We train our detector based on these displacement vectors. The experimental results show that compared to the current learning-based adversarial detection methods, the proposed system is capable of detecting the adversarial examples using a far simpler network. In addition, the proposed method is independent of the attack type, and is able to detect even novel attacks. It is also revealed that the proposed system learns the discrimination function even using a small amount of training data without any hyper-parameter tuning. We obtain remarkable results in detecting adversarial examples which are placed near and far from the decision boundary, improving state-of-the-art in detecting 2-norm Carlini and Wagner attack (L2-C&W) and ∞-norm Projected Gradient Descent attack (L∞-PGD), where just Fast Gradient Sign Method (FGSM) is used for training the system.

Full Text