Abstract
It is important to detect adversarial samples in the physical world that are far away from the training data distribution. Some adversarial samples can make a machine learning model generate a highly overconfident distribution in the testing stage. Thus, we proposed a mechanism for detecting adversarial samples based on semisupervised generative adversarial networks (GANs) with an encoder-decoder structure; this mechanism can be applied to any pretrained neural network without changing the network’s structure. The semisupervised GANs also give us insight into the behavior of adversarial samples and their flow through the layers of a deep neural network. In the supervised scenario, the latent feature (or the discriminator’s output score information) of the semi-supervised GAN and the target network's logit information are used as the input of logistic regression classifier to detect the adversarial samples. In the unsupervised scenario, first, we proposed a one-class classier based on the semisupervised Gaussian mixture conditional generative adversarial network (GM-CGAN) to fit the joint feature information of the normal data, and then, we used a discriminator network to detect normal data and adversarial samples. In both supervised scenarios and unsupervised scenarios, experimental results show that our method outperforms latest methods.
Highlights
Deep neural networks (DNNs) have achieved high accuracy in many classification tasks, such as speech recognition [1], objection detection [2], and image classification [3]
One such method relies on the adversarial training method by adding adversarial samples in the training phase [7]. is method is robust to a variety of adversarial attacks but is ineffective against certain other attacks
Grosse et al, [18] showed that adversarial samples have different distributions from normal data. Considering this finding, we study the feature distribution information of normal samples through a semisupervised generative adversarial networks (GANs) in the present article. ere are differences in the feature distributions between real samples and adversarial samples when the adversarial samples are input to the generator
Summary
Deep neural networks (DNNs) have achieved high accuracy in many classification tasks, such as speech recognition [1], objection detection [2], and image classification [3]. These DNNs are robust to random noise, they can mislead the model and cause it to output erroneous predictions when inputting small perturbations that are hard for humans to detect. Several methods have been proposed to protect against DNN attacks. One such method relies on the adversarial training method by adding adversarial samples in the training phase [7]. When the parameters and structure of the neural network are fixed, neither of these methods can be used without modifying the neural network structure or retraining the neural network
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.