WordIllusion: An adversarial text generation algorithm based on human cognitive system

Haoran Fu,Chundong Wang,Jiaqi Sun,Yumeng Zhao,Hao Lin,Junqing Sun,Baixue Zhang

doi:10.1016/j.cogsys.2023.101179

Abstract

Although natural language processing technology has shown strong performance in many tasks, it is very vulnerable to adversarial examples, i.e., sentences with some small perturbations can fool AI models. Current adversarial texts for English are usually generated by finding substitute words in adjacent spaces of keyword vectors. Unlike English, Chinese is more discrete and has a more complex font structure, which words that are closer in vector spaces may differ greatly in physical structure. Therefore, adversarial examples generated by current methods possess lower quality and can be easily perceived by human, or rather, they are not suitable for the human cognitive system. In this paper, we propose the “WordIllusion”, a new detectable black-box algorithm used for generating Chinese adversarial texts. In this method, we create a CKSF evaluation indicator to select the key words of sentences. And then, based on the shape bias of human cognitive system and the rectification understanding to create replacement spaces of key words. To verify the effectiveness of WordIllusion, we experiment with two types of text classification tasks by using six natural language processing models. The result indicates that our method is able to improve the accuracy rate efficiently, and the generated adversarial texts can be very misleading.

Full Text