Abstract
Adversarial example generation (AEG) has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples, which reveals the vulnerability of DNNs, motivating us to find good solutions to improve the robustness of DNN models. Due to the extensiveness and high liquidity of natural language over the social networks, various natural language-based adversarial attack algorithms have been proposed in the literature. These algorithms generate adversarial text examples with high semantic quality. However, the generated adversarial text examples and the corresponding attack models may be maliciously or illegally used. To tackle this problem, we present a general framework encapsulated in the cloud application programming interfaces (APIs) for generating watermarked adversarial text examples to protect adversarial text examples and corresponding adversarial text attack models. For each word in a given text, a set of candidate words are determined to ensure that all the words in the set can be used to carry secret bits or facilitate the construction of adversarial example. By applying a word-level adversarial text generation algorithm, the watermarked adversarial text example can be finally generated. Experiment results show that the adversarial text examples generated by the proposed method not only successfully fool advanced DNN models, but also carry watermarks that can effectively verify the ownership and trace the source of the adversarial examples and the corresponding attack models. Moreover, the watermark can still survive after attacked with AEG algorithms, which has shown the applicability and superiority.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.