Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

Gongshen Liu ,Haodong Zhao ,Pengzhou Cheng ,Wei Du ,Wei Lu ,Zongru Wu

doi:10.48550/arxiv.2309.06055

Abstract

Applicating third-party data and models has become a new paradigm for language modeling in NLP, which also introduces some potential security vulnerabilities because attackers can manipulate the training process and data source. In this case, backdoor attacks can induce the model to exhibit expected behaviors through specific triggers and have little inferior influence on primitive tasks. Hence, it could have dire consequences, especially considering that the backdoor attack surfaces are broad. However, there is still no systematic and comprehensive review to reflect the security challenges, attacker's capabilities, and purposes according to the attack surface. Moreover, there is a shortage of analysis and comparison of the diverse emerging backdoor countermeasures in this context. In this paper, we conduct a timely review of backdoor attacks and countermeasures to sound the red alarm for the NLP security community. According to the affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into three categorizations: attacking pre-trained model with fine-tuning (APMF) or parameter-efficient tuning (APMP), and attacking final model with training (AFMT). Thus, attacks under each categorization are combed. The countermeasures are categorized into two general classes: sample inspection and model inspection. Overall, the research on the defense side is far behind the attack side, and there is no single defense that can prevent all types of backdoor attacks. An attacker can intelligently bypass existing defenses with a more invisible attack. Drawing the insights from the systematic review, we also present crucial areas for future research on the backdoor, such as empirical security evaluations on large language models, and in particular, more efficient and practical countermeasures are solicited.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Backdoor Attacks on Image Classification Models in Deep Neural Networks
Quanxin Zhang ... Yajie Wang
Chinese Journal of Electronics | VOL. 31
Quanxin Zhang, et. al.Quanxin Zhang ... Yajie Wang
01 Mar 2022
Chinese Journal of Electronics | VOL. 31

Vulnerabilities of Deep Learning-Driven Semantic Communications to Backdoor (Trojan) Attacks
Yalin E Sagduyu ... Tugba Erpek
-
Yalin E Sagduyu, et. al.Yalin E Sagduyu ... Tugba Erpek
22 Mar 2023
22 Mar 2023

A frequency-injection backdoor attack against DNN-Based finger vein verification
Huijie Zhang ... Ling Lv
Computers & Security | VOL. 144
Huijie Zhang, et. al.Huijie Zhang ... Ling Lv
10 Jun 2024
Computers & Security | VOL. 144

Kallima: A Clean-Label Framework for Textual Backdoor Attacks
Xiaoyi Chen ... Yinpeng Dong
-
Xiaoyi Chen, et. al.Xiaoyi Chen ... Yinpeng Dong
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

Abstract

Talk to us

Similar Papers