On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Sanglee Park,Jungmin So

doi:10.3390/app10228079

Sanglee Park, Jungmin So

Open Access

PDF Available

https://doi.org/10.3390/app10228079

Copy DOI

Export

Save

Cite

Journal: Applied Sciences	Publication Date: Nov 14, 2020
Citations: 19	License type: CC BY 4.0

Affiliation: Sogang University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.

Highlights

Deep neural networks (DNN) have recently been showing high performance for various applications such as computer vision [1,2], natural language processing [3] and speech recognition [4].a major vulnerability of existing neural networks has been pointed out, which is called vulnerability against adversarial examples
A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters
To find out if adversarial training is effective in defending against multiple types of adversarial examples, we train a neural network model using adversarial examples created from various well-known attack methods

Summary

Introduction

Deep neural networks (DNN) have recently been showing high performance for various applications such as computer vision [1,2], natural language processing [3] and speech recognition [4]. Tramer et al [13] tried to answer this question in the context of a white-box attack, where the attacker has complete access to the parameters of the trained model They found that different types of adversarial examples, such as examples created using L∞ - and L1 -perturbation, are mutually exclusive, which means adversarially training with one type of adversarial examples does not help defending against the other types of adversarial examples. Repeatedly testing the model would incur time and monetary cost [15] Under these assumptions, the attacker trains his own target network and uses the target network to apply an attack algorithm and create adversarial examples. To find out if adversarial training is effective in defending against multiple types of adversarial examples, we train a neural network model using adversarial examples created from various well-known attack methods. The results show that it is very difficult to train a robust network using adversarial training, because the attack methods and other training configurations such as data augmentation strategy can affect the robustness of the network against black-box attacks

Creating Adversarial Examples

Fast Gradient Sign Method

Jacobian Saliency Map Attack

Carlini and Wagner Attack

Adversarial Training

Test-Time Processing

Adversarial Example Detection

Attack Model and Experiment Setup

Performance of an Adversarially Trained LeNet5 Network

Performance of an Adversarially Trained ResNet18 Network

Impact of Data Augmentation on Robustness against Adversarial Examples

Performance of a Network Trained without Data Augmentation

Findings

Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

On Improving the Effectiveness of Adversarial Training
Yi Qin ... Chuan Yue
-
Yi Qin, et. al.Yi Qin ... Chuan Yue
13 Mar 2019
13 Mar 2019

A Survey on Adversarial Examples in Deep Learning
Kai Chen ... Leiming Yan
Journal on Big Data | VOL. 2
Kai Chen, et. al.Kai Chen ... Leiming Yan
01 Jan 2020
Journal on Big Data | VOL. 2

A Highly Stealthy Adaptive Decay Attack Against Speaker Recognition
Xinyu Zhang ... Sicong Zhang
IEEE Access | VOL. 10
Xinyu Zhang, et. al.Xinyu Zhang ... Sicong Zhang
01 Jan 2021
IEEE Access | VOL. 10

Transfer-based Adversarial Attack with Rectified Adam and Color Invariance
Jia Ding ... Zhiwu Xu
International Journal of Software and Informatics | VOL. 12
Jia Ding, et. al.Jia Ding ... Zhiwu Xu
01 Jan 2021
International Journal of Software and Informatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences