Abstract

AbstractImage recognition on deep neural network is vulnerable to adversarial sample attacks. The adversarial attack accuracy is low when only limited queries on the target are allowed with the current black box environment. This paper proposes a target adversarial attack algorithm discrete cosine transform‐mean target feature attack (DTFA) based on the target features and a limited‐area sampling method. The algorithm first examines the original image and a target image to generate an initial adversarial example. Then the disturbance is sampled from the low‐frequency region intercepted by Gaussian noise after discrete cosine transform. The authors determine the size of the disturbance according to the difference between the adversarial example and the original image with consideration of the number of iterations and the position of the target feature region. The disturbance is applied on the initial adversarial example to generate the new adversarial example with the difference from the original image reduced. To evaluate the proposed algorithm, based on the common image classification model InceptionV3, and with identical queries accessing the same target model, the authors conduct experiments to compare the attack effectiveness of DTFA and the benchmark algorithms on the same image and target datasets. Experimental results show that the generated adversarial examples by the proposed algorithm are superior to 94% of those by the similar attack algorithms with less than 10,000 access queries on the target model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call