Abstract

Most existing deep neural networks are susceptible to the influence of adversarial examples, which may cause them to output incorrect prediction results. An adversarial example is the addition of small noise disturbances to the input data samples, which will deceive the classification. Generally, these modifications are very small noise disturbances, which are not easily noticed by the human eye. However, most existing adversarial attacks achieve only low success rates in typical black-box settings, where the attackers have no prior knowledge about the model structures and/or model parameters. To tackle this problem, we propose an iterative algorithm based on an acceleration gradient to enhance the adversary attack. Our method accumulates the gradient of the loss function in each iteration and uses the next gradient information to influence the future gradient. We also introduce the scaling invariance principle of a deep neural network to optimize the input images for black-box attacks. In addition, to handle the drawbacks of a traditional iterative fast gradient sign method, we further present two gradient optimization methods. Experimental results on the ImageNet dataset show that our attack methods can achieve good transferability. In addition, those models obtained by integrated adversarial training with strong defense capability are also quite vulnerable to our black-box attack.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call