Abstract

Modern image classification networks endure adversarial attacks that deliberately alter the input image by adding small, often imperceptible, perturbations to mislead the network’s classification result. From a perspective of manifold theory, most adversarial examples generated by current adversarial attack methods are off-manifold and can be detected by adversarial defense methods based on manifold theory. In this study, we propose a novel adversarial attack method for crafting on-manifold adversarial examples. Specifically, we utilize the adversarial autoencoder (AAE) to obtain the low-dimensional manifold of data, and design a latent substitute model based on the data manifold to ensure that the generated adversarial examples are still on-manifold. Additionally, to overcome the difficulty in limiting the perturbation of adversarial examples when searching in the latent space, we propose a gradient decoding strategy (GDS) and confidence re-ranking strategy (CRS). The results demonstrate the competitive performance of our method compared with that of some of the current attack methods applied to state-of-the-art defense models, especially those based on the manifold theory. Additionally, when using the proposed method for adversarial training, the robustness of the model is improved without reducing the accuracy of classifying the original dataset. We hope that our proposed attack method can serve as a benchmark for evaluating and improving the robustness of networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call