Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder

Ang Li,Fangyuan Zhang,Shuangjiao Li,Tianhua Chen,Pan Su,Hongtao Wang

doi:10.1016/j.eswa.2022.119170

Ang Li, Fangyuan Zhang + Show 4 more

Open Access

https://doi.org/10.1016/j.eswa.2022.119170

Copy DOI

Abstract

In spite deep learning has advanced numerous successes, recent research has shown increasing concern on its vulnerability over adversarial attacks. In Natural Language Processing, crafting high-quality adversarial text examples is much more challenging due to the discrete nature of texts. Recent studies perform transformations on characters or words, which are generally formulated as combinatorial optimization problems. However, these approaches suffer from inefficiency due to the high dimensional search space. To address this issue, in this paper, we propose an end-to-end Seq2seq Stacked Auto-Encoder (SSAE) neural network, which generates adversarial text examples efficiently via direct network inference. SSAE has two salient features. The outer auto-encoder preserves syntactic and semantic information to the original examples. The inner auto-encoder projects sentence embedding into a high-level semantic representation, on which constrained perturbations are superimposed to increase adversarial ability. Experimental results suggest that SSAE has a higher attack success rate than existing word-level attack methods, and is 100x to 700x faster at attack speed on IMDB dataset. We further find out that the adversarial examples generated by SSAE have strong transferability to attack different victim models.

Full Text