Frequency domain model-augmented feature space transferable attack

Jing Shi,Yong-Ping Wang,Rui-Chun Gu,En-Hui Xu,Xiao-Lin Zhang

doi:10.3233/jifs-234156

Abstract

Deep neural networks (DNNs) are susceptible to adversarial attacks, and one important factor is that adversarial samples are transferable, i.e., adversarial samples generated by a particular network may deceive other black-box models. However, existing transferable adversarial attacks tend to modify the input features of images directly without selection to reduce the prediction accuracy in the alternative model, which would enable the adversarial samples to fall into the model’s local optimum. Alternative models differ significantly from the victim model in most cases, and while simultaneously attacking multiple models may improve transferability, gathering numerous different models is more challenging and expensive. We simulate various models using frequency domain transformation to close the gap between the source and victim models and improve transferability. At the same time, we destroy important intermediate layer features that influence the decision of the model in the feature space. Additionally, smoothing loss is introduced to remove high-frequency perturbations. Extensive experiments demonstrate that our FM-FSTA attack generates more well-hidden and transferable adversarial samples, and achieves a high deception rate even when attacking adversarially trained models. Compared to other methods, our FM-FSTA improved attack success rate under different defense mechanisms, which reveals the potential threats of current robust models.

Full Text