Dual teachers for self-knowledge distillation

Zheng Li,Xiang Li,Lingfeng Yang,Renjie Song,Jian Yang,Zhigeng Pan

doi:10.1016/j.patcog.2024.110422

Abstract

We introduce an efficient self-knowledge distillation framework, Dual Teachers for Self-Knowledge Distillation (DTSKD), where the student receives self-supervisions by dual teachers from two substantially different fields, i.e., the past learning history and the current network structure. Specifically, DTSKD trains a considerably lightweight multi-branch network and acquires predictions from each, which are simultaneously supervised by a historical teacher from the previous epoch and a structural teacher under the current iteration. To our best knowledge, it is the first attempt to jointly conduct historical and structural self-knowledge distillation in a unified framework where they demonstrate complementary and mutual benefits. The Mixed Fusion Module (MFM) is further developed to bridge the semantic gap between deep stages and shallow branches by iteratively fusing multi-stage features based on the top-down topology. Extensive experiments prove the effectiveness of our proposed method, showing superior performance over related state-of-the-art self-distillation works on three datasets: CIFAR-100, ImageNet-2012, and PASCAL VOC.

Full Text