Abstract

Attacks and defenses have been a hot issue in the field of deep learning. Because of the vulnerability of deep learning models, it is easy to attack them by adversarial examples. Adversarial examples are carefully added perturbations to fool the deep learning model. For such samples, humans are able to classify correctly, while deep learning models are prone to give a high confidence false output. The working mechanism of adversarial samples has always been a difficult point and focus of research. In this work, the nature of the attack from the adversarial samples is studied accordingly from the perspective of the main and minor features of PCA (principal component analysis). Firstly, we find that the deep learning model mainly learns the main features of the data, rather than the minor features. Secondly, we discover that perturbations on the main features are more likely to lead to misclassification of the deep learning model, while perturbations on the minor features have little effect. Finally, we propose a method to generate adversarial samples in the sample subspace. Experimental results on both MNIST and CIFAR10 show that the proposed method generates smaller adversarial sample perturbations, which are more difficult to detect by human eyes and more aggressive against white-box attacks on deep learning models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call