Adaptive Gradient-based Word Saliency for adversarial text attacks

Yupeng Qi,Xinghao Yang,Baodi Liu,Kai Zhang,Weifeng Liu

doi:10.1016/j.neucom.2024.127667

Abstract

Natural Language Processing (NLP) models are known vulnerable to adversarial text attacks. Various word-level attacks have been proposed to modify input words by a pre-calculated word saliency order or the heuristic optimization algorithm. However, the predefined fixed order usually leads to a low attack success ratio, and the heuristic algorithm is often inefficient due to many model queries. In this paper, an Adaptive Gradient-based Word Saliency (AGWS) is proposed to improve the query-efficiency and simultaneously preserve a high attack success ratio for the word-level adversarial text attack. Firstly, the AGWS calculates the word saliency with a one-time gradient to all words instead of iterative queries, so the query-efficiency is enhanced. Secondly, the AGWS adaptively updates the word saliency via the variable neighborhood search algorithm to avoid the fixed modification order, which is significant in improving the attack success ratio. Additionally, the AGWS collect substitutes with a hybrid sememe and synonym strategy to enlarge adversary options. Extensive experiments and ablation studies manifest that the AGWS achieves the highest or second highest attack success ratio with lowest word perturbation percentage and improves the query-efficiency compared with baselines. Besides, the AGWS also shows superiorities in adversarial training and transferability.

Full Text