Abstract

Introduction: Deep neural networks due to their linear nature are sensitive to adversarial examples. They can easily be broken just by a small disturbance to the input data. Some of the existing methods to perform these kinds of attacks are pixel-level perturbation and spatial transformation of images. Method: These methods generate adversarial examples that can be fed to the network for wrong predictions. The drawback that comes with these methods is that they are really slow and computationally expensive. This research work performed a black box attack on the target model classifier by using the generative adversarial networks (GAN) to generate adversarial examples that can fool a classifier model to classify the images as wrong classes. The proposed method used a biased dataset that does not contain any data of the target label to train the first generator Gnorm of the first stage GAN, and after the first training has finished, the second stage generator Gadv, which is a new generator model that does not take random noise as input but the output of the first generator Gnorm. Result: The generated examples have been superimposed with the Gnorm output with a small constant, and then the superimposed data have been fed to the target model classifier to calculate the loss. Some additional losses have been included to constrain the generation from generating target examples. Conclusion: The proposed model has shown a better fidelity score, as evaluated using Fretchet inception distance score (FID), which was up to 42.43 in the first stage and up to 105.65 in the second stage with the attack success rate of up to 99.13%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call