The goal of textual adversarial attack methods is to replace some words in an input text in order to make the victim model misbehave. This article proposes an effective word-level adversarial attack method based on sememes and an improved quantum-behaved particle swarm optimization (QPSO) algorithm. The sememe-based substitute method, which uses the words sharing the same sememes as the substitutes of the original words, is first employed to form the reduced search space. Then, an improved QPSO algorithm, called historical information-guided QPSO with random drift local attractor (HIQPSO-RD), is proposed to search the reduced search space for adversarial examples. The HIQPSO-RD introduces historical information into the current mean best position of the QPSO, for the purpose of improving the convergence speed of the algorithm, by enhancing its exploration ability and preventing the premature convergence of the swarm. The proposed algorithm uses the random drift local attractor technique to make a good balance between its exploration and exploitation, so that the algorithm can find a better adversarial attack example with low grammaticality and perplexity (PPL). In addition, it employs a two-stage diversity control strategy to enhance the search performance of the algorithm. Experiments on three natural language processing (NLP) datasets, with three commonly used nature language processing models as victim models, show that our method achieves higher attack success rates but lower modification rates than the state-of-the-art adversarial attack methods. Moreover, the results of human evaluations show that adversarial examples generated by our method can better maintain the semantic similarity and grammatical correctness of the original input.
Read full abstract