Abstract

Deep neural networks provide good performance for image recognition, speech recognition, text recognition, and pattern recognition. However, such networks are vulnerable to backdoor attacks. In a backdoor attack, normal data that do not include a specific trigger are correctly classified by the target model, but backdoor data that include the trigger are incorrectly classified by the target model. One advantage of a backdoor attack is that the attacker can use a specific trigger to attack at a desired time. In this study, we propose a backdoor attack targeting the BERT model, which is a classification system designed for use in the text domain. Under the proposed method, the model is additionally trained on a backdoor sentence that includes a specific trigger, and afterward, if the trigger is attached before or after an original sentence, it will be misclassified by the model. In our experimental evaluation, we used two movie review datasets (MR and IMDB). The results show that using the trigger word “ATTACK” at the beginning of an original sentence, the proposed backdoor method had a 100% attack success rate when approximately 1.0% and 0.9% of the training data consisted of backdoor samples, and it allowed the model to maintain an accuracy of 86.88% and 90.80% on the original samples in the MR and IMDB datasets, respectively.

Highlights

  • Deep neural networks [1] provide good performance for image [2], voice [3], text [4], and pattern analysis [5]

  • Poisoning attacks [11] and backdoor attacks [12,13,14] are typical examples of causative attacks. e exploratory attack is more practical because it does not require the addition of training data as does the causative attack, but it has the disadvantage of involving the real-time manipulation of test data

  • The model is trained on a backdoor sentence that includes a specific trigger, and afterward, if the trigger is attached before or after an original sentence, it will be misclassified by the model. e contributions of this study are as follows

Read more

Summary

Introduction

Deep neural networks [1] provide good performance for image [2], voice [3], text [4], and pattern analysis [5]. There are security vulnerabilities in such networks. Barreno et al [6] divided these vulnerabilities into the risk from exploratory attacks and that from causative attacks. An exploratory attack induces misclassification by manipulating the test data of a deep neural network that has already been trained. A typical example of an exploratory attack is an adversarial example [7,8,9,10]. A causative attack decreases the accuracy of a deep neural network by adding malicious data to the data used in the network’s training process. E exploratory attack is more practical because it does not require the addition of training data as does the causative attack, but it has the disadvantage of involving the real-time manipulation of test data Poisoning attacks [11] and backdoor attacks [12,13,14] are typical examples of causative attacks. e exploratory attack is more practical because it does not require the addition of training data as does the causative attack, but it has the disadvantage of involving the real-time manipulation of test data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.