Generating watermarked adversarial texts

Mingjie Li,Hanzhou Wu,Xinpeng Zhang,Zichi Wang

doi:10.1117/1.jei.32.2.023023

Mingjie Li, Hanzhou Wu + Show 2 more

https://doi.org/10.1117/1.jei.32.2.023023

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Adversarial example generation (AEG) has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples, which reveals the vulnerability of DNNs, motivating us to find good solutions to improve the robustness of DNN models. Due to the extensiveness and high liquidity of natural language over the social networks, various natural language-based adversarial attack algorithms have been proposed in the literature. These algorithms generate adversarial text examples with high semantic quality. However, the generated adversarial text examples and the corresponding attack models may be maliciously or illegally used. To tackle this problem, we present a general framework encapsulated in the cloud application programming interfaces (APIs) for generating watermarked adversarial text examples to protect adversarial text examples and corresponding adversarial text attack models. For each word in a given text, a set of candidate words are determined to ensure that all the words in the set can be used to carry secret bits or facilitate the construction of adversarial example. By applying a word-level adversarial text generation algorithm, the watermarked adversarial text example can be finally generated. Experiment results show that the adversarial text examples generated by the proposed method not only successfully fool advanced DNN models, but also carry watermarks that can effectively verify the ownership and trace the source of the adversarial examples and the corresponding attack models. Moreover, the watermark can still survive after attacked with AEG algorithms, which has shown the applicability and superiority.

Full Text