Abstract

We introduce an efficient self-knowledge distillation framework, Dual Teachers for Self-Knowledge Distillation (DTSKD), where the student receives self-supervisions by dual teachers from two substantially different fields, i.e., the past learning history and the current network structure. Specifically, DTSKD trains a considerably lightweight multi-branch network and acquires predictions from each, which are simultaneously supervised by a historical teacher from the previous epoch and a structural teacher under the current iteration. To our best knowledge, it is the first attempt to jointly conduct historical and structural self-knowledge distillation in a unified framework where they demonstrate complementary and mutual benefits. The Mixed Fusion Module (MFM) is further developed to bridge the semantic gap between deep stages and shallow branches by iteratively fusing multi-stage features based on the top-down topology. Extensive experiments prove the effectiveness of our proposed method, showing superior performance over related state-of-the-art self-distillation works on three datasets: CIFAR-100, ImageNet-2012, and PASCAL VOC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.