Abstract

In image classification of deep learning, adversarial examples where input is intended to add small magnitude perturbations may mislead deep neural networks (DNNs) to incorrect results, which means DNNs are vulnerable to them. Different attack and defense strategies have been proposed to better research the mechanism of deep learning. However, those researches in these networks are only for one aspect, either an attack or a defense. There is in the improvement of offensive and defensive performance, and it is difficult to promote each other in the same framework. In this paper, we propose Cycle-Consistent Adversarial GAN (CycleAdvGAN) to generate adversarial examples, which can learn and approximate the distribution of the original instances and adversarial examples, especially promoting attackers and defenders to confront each other and improve their ability. For CycleAdvGAN, once the GeneratorA and D are trained, GA can generate adversarial perturbations efficiently for any instance, improving the performance of the existing attack methods, and GD can generate recovery adversarial examples to clean instances, defending against existing attack methods. We apply CycleAdvGAN under semiwhite-box and black-box settings on two public datasets MNIST and CIFAR10. Using the extensive experiments, we show that our method has achieved the state-of-the-art adversarial attack method and also has efficiently improved the defense ability, which made the integration of adversarial attack and defense come true. In addition, it has improved the attack effect only trained on the adversarial dataset generated by any kind of adversarial attack.

Highlights

  • With Deep Neural Networks (DNNs) rapid development, they have achieved great success in various tasks handling the image recognition [1], text processing [2], and speech recognition [3]

  • Inspired by CycleGAN for style transfer which is shown to imply the cycle consistency constraints, we introduce variants of cycle consistency losses to ensure the successful integration of adversarial attack and defense. e adversarial loss constrains the output of generator after being added to input to look like the instance in transfer domain

  • To interpret why CycleAdvGAN demonstrates better transferability, we further examine the update directions given by fast gradient sign method (FGSM), Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and CycleAdvGAN along the iterations

Read more

Summary

Introduction

With Deep Neural Networks (DNNs) rapid development, they have achieved great success in various tasks handling the image recognition [1], text processing [2], and speech recognition [3]. DNNs have been proved to be vulnerable and susceptible to adversarial example [4], and the carefully crafted samples look similar to natural images but are designed to mislead a pretrained model. A more straightforward approach is to change pixels value simultaneously in the direction of the gradient such as fast gradient sign method (FGSM) [8] and iterative variants of gradient-based methods (BIM) [9] They quickly find the perturbations at the expense of uncontrollability

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call