Abstract
Deep neural networks (DNNs) provide superior performance on machine learning tasks such as image recognition, speech recognition, pattern analysis, and intrusion detection. However, an adversarial example, created by adding a little noise to an original sample, can cause misclassification by a DNN. This is a serious threat to the DNN because the added noise is not detected by the human eye. For example, if an attacker modifies a right-turn sign so that it misleads to the left, autonomous vehicles with the DNN will incorrectly classify the modified sign as pointing to the left, but a person will correctly classify the modified sign as pointing to the right. Studies are under way to defend against such adversarial examples. The existing method of defense against adversarial examples requires an additional process such as changing the classifier or modifying input data. In this paper, we propose a new method for detecting adversarial examples that does not invoke any additional process. The proposed scheme can detect adversarial examples by using a pattern feature of the classification scores of adversarial examples. We used MNIST and CIFAR10 as experimental datasets and Tensorflow as a machine learning library. The experimental results show that the proposed method can detect adversarial examples with success rates: 99.05% and 99.9% for the untargeted and targeted cases in MNIST, respectively, and 94.7% and 95.8% for the untargeted and targeted cases in CIFAR10, respectively.
Highlights
Deep neural networks (DNNs) [26] provide excellent service on machine learning tasks such as image recognition [28], speech recognition [10, 11], pattern analysis [4], and intrusion detection [24]
DNNs are vulnerable to adversarial examples [29, 33], in which a little noise has been added to an original sample
The threshold is the value applied as a judgment criterion: If the classification score difference is less than threshold, the input is identified as an adversarial example, and if it is greater than the threshold, the input is determined to be an original sample
Summary
Deep neural networks (DNNs) [26] provide excellent service on machine learning tasks such as image recognition [28], speech recognition [10, 11], pattern analysis [4], and intrusion detection [24]. If an attacker generates a modified left-turn sign so that it will be incorrectly categorized by a DNN, the autonomous vehicle with the DNN will incorrectly classify the modified sign as pointing to the right, but a person will correctly classify the modified sign as pointing to the left Because such adversarial examples are serious threats to a DNN, several defenses against adversarial examples are being studied. Adversarial examples can be detected using patterns characteristic of the classification scores without the need to change a classifier or modify input data. We propose a defense system for detecting adversarial examples that does not require modifications to the classifier or the input data, making use of this pattern feature of classification scores.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.