Improvements to adversarial training for text classification tasks

Jia-Long He,Li-Xin Liu,Rui-Chun Gu,Yong-Ping Wang,Xiao-Lin Zhang,En-Hui Xu

doi:10.3233/jifs-234034

Abstract

Although deep learning models show powerful performance, they are still easily deceived by adversarial samples. Some methods for generating adversarial samples have the drawback of high time loss, which is problematic for adversarial training, and the existing adversarial training methods are difficult to adapt to the dynamic nature of the model, so it is still challenging to study an efficient adversarial training method. In this paper, we propose an adversarial training method, the core of which is the improved adversarial sample generation method AGFAT for adversarial training and the improved dynamic adversarial training method AGFAT-DAT. AGFAT uses a word frequency-based approach to identify significant words, filter replacement candidates, and use an efficient semantic constraint module as a means to reduce the time of adversarial sample generation; AGFAT-DAT is a dynamic adversarial training approach that uses a cyclic attack on the model after adversarial training and generates adversarial samples for adversarial training again. It is demonstrated that the proposed method can significantly reduce the generation time of adversarial samples, and the adversarial-trained model can also effectively defend against other types of word-level adversarial attacks.

Full Text