Adversarial Attack Algorithm on Target Features in Simplex Noise Area

Songjie Wei,Wei Chen,Yazhou Liu

doi:10.3724/sp.j.1089.2021.18365

Abstract

A target adversarial attack algorithm based on target features and limited area sampling is proposed to improve the low attack accuracy of the current adversarial attack algorithms when only limited target model access queries are allowed in the black-box scenario. Firstly, an initial adversarial example is generated by the original image and the target image. Then the disturbance is sampled in the Simplex-mean noise region and determined by the location of target feature region and the difference between the adversarial example and the original image. The disturbance is used in the initial adversarial example to keep the newly generated one adversarial and to reduce the difference between it and the original image. Based on the common image classification model InceptionV3 and VGG16, under the same target model access query and the <italic>l</italic>2 distance between the adversarial example and the original image is less than 55.89. The experimental results using algorithms such as BBA to attack the same image set and target set show that the accuracy of the proposed algorithm is at least 50% higher than that of similar attack algorithms under the same target model access query and <italic>l</italic>2=55.89 , with no more than 5 000 target queries in InceptionV3 model.

Full Text