Abstract
Although Deep Neural Network (DNN) has led to unprecedented progress in various natural language processing (NLP) tasks, research shows that deep models are extremely vulnerable to backdoor attacks. The existing backdoor attacks mainly inject a small number of poisoned samples into the training dataset with the labels changed to the target one. Such mislabeled samples would raise suspicion upon human inspection, potentially revealing the attack. To improve the stealthiness of textual backdoor attacks, we propose the first clean-label framework Kallima for synthesizing $$mimesis$$ -style backdoor samples to develop insidious textual backdoor attacks. We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger. Our framework is compatible with most existing backdoor triggers. The experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.
Full Text
Topics from this Paper
Backdoor Attacks
Mislabeled Samples
Adversarial Perturbations
Human Inspection
Natural Language Processing
+ Show 5 more
Create a personalized feed of these topics
Get StartedTalk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
arXiv: Information Retrieval
Nov 23, 2017
ACM Computing Surveys
Apr 20, 2023
Mar 22, 2023
Cognitive Computation
Mar 14, 2018
May 23, 2022
Computers & Security
Nov 1, 2021
IEEE Access
Jan 1, 2020
Jan 1, 2021
Oct 15, 2021