Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation

Yunjie Ge,Chao Shen,Qi Li,Xinlu Zhuang,Cong Wang,Baolin Zheng,Qian Wang

doi:10.1145/3474085.3475254

Abstract

Motivated by resource-limited scenarios, knowledge distillation (KD) has received growing attention, effectively and quickly producing lightweight yet high-performance student models by transferring the dark knowledge from large teacher models. However, many pre-trained teacher models are downloaded from public platforms that lack necessary vetting, posing a possible threat to knowledge distillation tasks. Unfortunately, thus far, there has been little research to consider the backdoor attack from the teacher model into student models in KD, which may pose a severe threat to its wide use. In this paper, we, for the first time, propose a novel Anti-Distillation Backdoor Attack (ADBA), in which the backdoor embedded in the public teacher model can survive the knowledge distillation process and thus be transferred to secret distilled student models. We first introduce a shadow to imitate the distillation process and adopt an optimizable trigger to transfer information to help craft the desired teacher model. Our attack is powerful and effective, which achieves 95.92%, 94.79%, and 90.19% average success rates of attacks (SRoAs) against several different structure student models on MNIST, CIFAR-10, and GTSRB, respectively. Our ADBA also performs robustly under different user distillation environments with 91.72% and 92.37% average SRoAs on MNIST and CIFAR-10, respectively. Finally, we show that the ADBA has a low overhead in the injecting process, which converges on 50 and 70 epochs on CIFAR-10 and GTSRB, respectively, while the normal training epochs of these datasets are almost 200.

Full Text