TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Wenqi Wang,Lina Wang,Run Wang,Jianpeng Ke

doi:10.1109/access.2021.3058278

Abstract

Sentiment classification has been broadly applied in real life, such as product recommendation and opinion-oriented analysis. Unfortunately, the widely employed sentiment classification systems based on deep neural networks (DNNs) are susceptible to adversarial attacks with imperceptible perturbations into the legitimate texts (also called adversarial texts ). Adversarial texts could cause erroneous outputs even without access to the target model, bringing security concerns to systems deployed in safety-critical applications. However, studies on defending against adversarial texts are still in the early stage and not ready for tackling the emerging threats, especially in dealing with unknown attacks. Investigating the minor differences between adversarial texts and legitimate texts and enhancing the robustness of target models are two mainstream ideas for defending against adversarial texts. However, both of them suffer the generalization issue in dealing with unknown adversarial attacks. In this paper, we proposed a general method, called TextFirewall , for defending against adversarial texts crafted by various adversarial attacks, which shows the potential in identifying new developed adversarial attacks in the future. Given a piece of text, our TextFirewall identifies the adversarial text by investigating the inconsistency between the target model’s output and the impact value calculated by important words in the text. TextFirewall could be deployed as a third-party tool without modifying the target model and agnostic to the specific type of adversarial texts. Experimental results demonstrate that our proposed TextFirewall effectively identifies adversarial texts generated by the three state-of-the-art (SOTA) attacks and outperforms previous defense techniques. Specifically, TextFirewall achieves an average accuracy of 90.7% on IMDB and 96.9% on Yelp in defending the three SOTA attacks.

Highlights

With the rapid development of online social networks, users have produced massive comments that provide valuable information for mining the preference, attitude, and opinion of product and service by using sentiment classification techniques [1], [2]
Experimental results show that TextFirewall can effectively distinguish adversarial texts generated by various adversarial attacks and outperform the existing defense methods in accuracy and generality
Our proposed TextFirewall works on leveraging the knowledge of legitimate texts without access to various adversarial texts generated by known and unknown adversarial attacks

Summary

Introduction

With the rapid development of online social networks, users have produced massive comments that provide valuable information for mining the preference, attitude, and opinion of product and service by using sentiment classification techniques [1], [2]. Deep neural networks (DNNs) based sentiment classification models are continuously developed due to the significant progress of DNN in the natural language processing (NLP) field. Studies [3], [4] have shown that DNN-based models are vulnerable to adversarial examples, which are imperceptible to human eyes and. Studies have shown that the adversarial examples appeared in the image [5], [6], text [7], [8], audio [9], and even malware [10], [11]. Adversarial examples bring security concerns to users who deploy the DNN-based systems in safety-critical applications, like selfdriving and text analysis for commercial purposes.

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Generating watermarked adversarial texts
Mingjie Li ... Zichi Wang
Journal of Electronic Imaging | VOL. 32
Mingjie Li, et. al.Mingjie Li ... Zichi Wang
28 Mar 2023
Journal of Electronic Imaging | VOL. 32

Automatic Generation of Adversarial Readable Chinese Texts
Mingxuan Liu ... Chao Zhang
IEEE Transactions on Dependable and Secure Computing | VOL. 20
Mingxuan Liu, et. al.Mingxuan Liu ... Chao Zhang
01 Mar 2023
IEEE Transactions on Dependable and Secure Computing | VOL. 20

What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations
...
-
, et. al. ...
21 Oct 2021
21 Oct 2021

What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations
Zhouhang Xie ...
-
Zhouhang Xie, et. al.Zhouhang Xie ...
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access