Improving the transferability of adversarial examples through neighborhood attribution

Wuping Ke,Desheng Zheng,Xiaoyu Li,Yuanhang He,Tianyu Li,Fan Min

doi:10.1016/j.knosys.2024.111909

Abstract

Adversarial examples, which add carefully planned perturbations to images, pose a serious threat to neural network applications. Transferable adversarial attacks, in which adversarial examples generated on the source model can successfully attack the target model, provide a realistic and undetectable method. Existing transfer-based attacks tend to improve the transferability of adversarial examples by destroying their intrinsic features. They destabilized features differentially by assessing their importance, thus rendering the model incapable of inference. However, the existing methods generate feature-importance assessments that are overly dependent on the source model, leading to inaccurate importance guidance and insufficient feature destruction. In this paper, we propose neighborhood expectancy attribution attacks (NEAA) that accurately guide the destruction of deep features, leading to highly transferable adversarial examples. First, we design a highly versatile attribution tool called neighborhood attribution to represent the importance of features that attribute highly similar results to various source models. Specifically, we discard the imputation of a single baseline and adopt the imputed expectation of a baseline within the neighborhood of the image. Subsequently, we generalize the neighborhood attribution to the middle layer of the model and simplify the computation by assuming linear independence. Finally, the attribution result guides the attack to destroy the intrinsic features of the image and obtain highly transferable adversarial examples. Numerous experiments demonstrate the effectiveness of the proposed method. Code is available at Github: https://github.com/KWPCCC/NEAA.

Full Text