Generating traceable adversarial text examples by watermarking in the semantic space

Mingjie Li,Xinpeng Zhang,Hanzhou Wu

doi:10.1117/1.jei.31.6.063034

Mingjie Li, Xinpeng Zhang + Show 1 more

https://doi.org/10.1117/1.jei.31.6.063034

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The adversarial examples have been proven to reveal the vulnerability of the deep neural networks (DNNs) model, which can be used to evaluate the performance and further improve the robustness of the model. Because text data is discrete, it is more difficult to generate adversarial examples in the natural language processing (NLP) domain than in the image domain. One of the challenges is that the generated adversarial text examples should maintain the correctness of grammar and the semantic similarity compared with the original texts. In this paper, we propose an adversarial text generation model, which generates high-quality adversarial text examples through an end-to-end model. Moreover, the adversarial text examples generated by our proposed model are embedded with watermarks, which can mark and trace the source of the generated adversarial text examples and prevent the model from being maliciously or illegally used. The experimental results show that the attack success rates of the proposed model can still reach higher than 88% even on the AG’s News dataset where generating adversarial text examples is more difficult. And the quality of adversarial text examples generated by the proposed model is higher than that of the baseline models. At the same time, because of the generated adversarial text examples are embedded with strong robust watermarks, the model can be better protected.

Full Text