Abstract

Backdoor attacks are poisoning attacks and serious threats to deep neural networks. When an adversary mixes poison data into a training dataset, the training dataset is called a poison training dataset. A model trained with the poison training dataset becomes a backdoor model and it achieves high stealthiness and attack-feasibility. The backdoor model classifies only a poison image into an adversarial target class and other images into the correct classes. We propose an additional procedure to our previously proposed countermeasure against backdoor attacks by using knowledge distillation. Our procedure removes poison data from a poison training dataset and recovers the accuracy of the distillation model. Our countermeasure differs from previous ones in that it does not require detecting and identifying backdoor models, backdoor neurons, and poison data. A characteristic assumption in our defense scenario is that the defender can collect clean images without labels. A defender distills clean knowledge from a backdoor model (teacher model) to a distillation model (student model) with knowledge distillation. Subsequently, the defender removes poison-data candidates from the poison training dataset by comparing the predictions of the backdoor and distillation models. The defender fine-tunes the distillation model with the detoxified training dataset to improve classification accuracy. We evaluated our countermeasure by using two datasets. The backdoor is disabled by distillation and fine-tuning further improves the classification accuracy of the distillation model. The fine-tuning model achieved comparable accuracy to a baseline model when the number of clean images for a distillation dataset was more than 13% of the training data. Our results indicate that our countermeasure can be applied for general image-classification tasks and that it works well whether the defender's received training dataset is a poison dataset or not.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call