Neighbor self-knowledge distillation

Peng Liang,Weiwei Zhang,Junhuang Wang,Yufeng Guo

doi:10.1016/j.ins.2023.119859

Abstract

Self-Knowledge Distillation (Self-KD), a technique that enables neural networks to learn from themselves, often relies on auxiliary modules or networks to generate supervisory signals for training. However, this approach incurs significant additional resource costs. Moreover, incorporating auxiliary classifiers within the network architecture creates a capacity mismatch when distilling knowledge from deep to shallow classifiers. This paper proposes a concise and efficient Self-KD method called Neighbor Self-Knowledge Distillation (NSKD), which introduces teacher assistants into the Self-KD by adding auxiliary classifiers to the shallow part of the network to construct distillations of multiple neighboring student-teacher assistant combinations to reduce the mismatch between the students' and teachers' abilities. During distillation, NSKD utilizes only the soft labels generated by each classifier and corresponding ground truth labels as supervisory signals, minimizing resource consumption. NSKD enables neighboring modules to learn from each other through neighboring distillation, enhancing overall network performance. Experimental results on five network models and seven popular datasets demonstrate the superiority of NSKD over other state-of-the-art Self-KD methods. Notably, NSKD achieves average accuracy improvements of 2.26%, 2.32%, and 2.4% on the CIFAR100, TinyImageNet, and fine-grained visual classification datasets.

Full Text