Abstract
Recent researches have shown that large natural language processing (NLP) models are vulnerable to a kind of security threat called the Backdoor Attack. Backdoor attacked models can achieve good performance on clean test sets but perform badly on those input sentences injected with designed trigger words. In this work, we point out a potential problem of current backdoor attacking research: its evaluation ignores the stealthiness of backdoor attacks, and most of existing backdoor attacking methods are not stealthy either to system deployers or to system users. To address this issue, we first propose two additional stealthiness-based metrics to make the backdoor attacking evaluation more credible. We further propose a novel word-based backdoor attacking method based on negative data augmentation and modifying word embeddings, making an important step towards achieving stealthy backdoor attacking. Experiments on sentiment analysis and toxic detection tasks show that our method is much stealthier while maintaining pretty good attacking performance. Our code is available at https://github.com/lancopku/SOS.
Full Text
Topics from this Paper
Backdoor Attack
Natural Language Processing Models
Natural Language Processing
Sentiment Analysis
Sentiment Analysis Tasks
+ Show 5 more
Create a personalized feed of these topics
Get StartedTalk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Jan 1, 2021
Oct 15, 2021
May 25, 2021
Journal of Clinical Oncology
Feb 20, 2022
Sep 1, 2021
JAMIA open
Jul 4, 2023
Jun 1, 2021
JMIR Medical Informatics
Feb 19, 2021
JCO Clinical Cancer Informatics
Dec 1, 2022
Sensors
Jun 5, 2020
European radiology
Aug 11, 2023
Jul 1, 2020
Circulation
Nov 8, 2022
BMJ Open
Feb 1, 2022
Journal of clinical and translational science
Jun 13, 2023