Increasing depth, distribution distillation, and model soup: erasing backdoor triggers for deep neural networks

Yijian Zhang,Guangling Sun,Hanzhou Wu,Qi Liu,Tianxing Zhang

doi:10.1117/1.jei.31.6.063005

Abstract

Deep neural networks are vulnerable to backdoor attacks, in which the adversary injects a trigger embedded set into the training process. Inputs marked with the trigger provide incorrect predictions, whereas clean inputs remain unaffected. To erase the latent triggers in models, increasing depth, distribution distillation, and model soup (ID3MS), a defensive solution that operates without prior knowledge of triggers and relies on a small clean set is introduced. The depth of the backdoor model is increased by adding fully connected layer(s) at the penultimate layer. Without a classification layer, the original backdoor and increased depth models are considered as teacher and student, respectively. The student model applies distribution distillation to refit the distribution of the clean set and erase the backdoor triggers. The distilled student model is then recovered with the classification layer and model soup is used to ensemble a collection of models generated by various fine-tuning hyperparameters. The experimental results validate the superior performance of the ID3MS compared with existing defensive techniques against several attacks across datasets.

Full Text