Abstract
The existing methods of generating adversarial texts usually change the original meanings of texts significantly and even generate the unreadable texts. These less readable adversarial texts can misclassify the machine classifier successfully, but they cannot deceive the human observers very well. In this paper, we propose a novel method that generates readable adversarial texts with some perturbations that can also confuse human observers successfully. Based on the continuous bag-of-words (CBOW) model, the proposed method looks for the appropriate perturbations to generate the adversarial texts through controlling the perturbation direction vectors. Meanwhile, we apply adversarial training to regularize the classification model and extend it to semi-supervised tasks with virtual adversarial training. Experiments are conducted to show that the generated adversaries are interpretable and confused to humans and the virtual adversarial training effectively improves the robustness of the model.
Highlights
Deep learning has been widely applied in many applications, such as speech recognition [1], natural language processing [2], image processing [3], wireless communications [4]–[6] and many other fields since it was first proposed in 2006 [7], accompanied with the rapid development of hardware capabilities and data volume
Szegedy et al [10] found that deep models, including the Convolutional Neural Network (CNN), are vulnerable to adversarial examples, which can confuse the classifier with high confidence by adding imperceptible perturbations into input data
The research of adversarial examples mainly focuses on image processing [10], other fields are attracting more and more attention, including natural language processing [11]
Summary
Deep learning has been widely applied in many applications, such as speech recognition [1], natural language processing [2], image processing [3], wireless communications [4]–[6] and many other fields since it was first proposed in 2006 [7], accompanied with the rapid development of hardware capabilities and data volume. Szegedy et al [10] found that deep models, including the Convolutional Neural Network (CNN), are vulnerable to adversarial examples, which can confuse the classifier with high confidence by adding imperceptible perturbations into input data. A small amount of perturbations adding to the text will cause a word become a completely different one that could change the classified result sufficient. W. Zhang et al.: Deep Learning Based Robust Text Classification Method via Virtual Adversarial Training TABLE 1. The work in this paper is to generate adversarial texts that are highly readable to confuse human observers. To the best of our knowledge, our work is the first to consider adding perturbations that meet the context in the neighborhood of words to generate adversarial texts. 2) We apply the generated adversarial texts to adversarial training which enhances the robustness of the model. We extend adversarial training to VOLUME 8, 2020 the semi-supervised learning with virtual adversarial training by adding regularization to the model. 3) Our method has lower training time complexity than the previous methods due to our method is solved in a smaller space
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.