Online knowledge distillation with elastic peer

Chao Tan,Jie Liu

doi:10.1016/j.ins.2021.10.043

Abstract

Knowledge distillation is a highly effective method for transferring knowledge from a cumbersome teacher network to a lightweight student network. However, teacher networks are not always available. An alternative method called online knowledge distillation, which applies a group of peer networks to learn collaboratively with each other, has been proposed previously. In this study, we revisit online knowledge distillation and find that the existing training strategy limits the diversity among peer networks. Thus, online knowledge distillation cannot achieve its full potential. To address this, a novel online knowledge distillation with elastic peer (KDEP) strategy is introduced here. The entire training process is divided into two phases by KDEP. In each phase, a specific training strategy is applied to adjust the diversity to an appropriate degree. Extensive experiments have been conducted on individual benchmarks, including CIFAR-100, CINIC-10, Tiny ImageNet, and Caltech-UCSD Birds. The results demonstrate the superiority of KDEP. For example, when the peer networks are ShuffleNetV2-1.0 and ShuffleNetV2-0.5, the target peer network ShuffleNetV2-0.5 achieves 57.00% top-1 accuracy on Tiny ImageNet via KDEP.

Full Text