DFEF: Diversify feature enhancement and fusion for online knowledge distillation

Xingzhu Liang,Xianjin Fang,Erhu Liu,Jian Zhang

doi:10.1111/exsy.13593

Abstract

AbstractTraditional knowledge distillation relies on high‐capacity teacher models to supervise the training of compact student networks. To avoid the computational resource costs associated with pretraining high‐capacity teacher models, teacher‐free online knowledge distillation methods have achieved satisfactory performance. Among these methods, feature fusion methods have effectively alleviated the limitations of training without the strong guidance of a powerful teacher model. However, existing feature fusion methods often focus primarily on end‐layer features, overlooking the efficient utilization of holistic knowledge loops and high‐level information within the network. In this article, we propose a new feature fusion‐based mutual learning method called Diversify Feature Enhancement and Fusion for Online Knowledge Distillation (DFEF). First, we enhance advanced semantic information by mapping multiple end‐of‐network features to obtain richer feature representations. Next, we design a self‐distillation module to strengthen knowledge interactions between the deep and shallow network layers. Additionally, we employ attention mechanisms to provide deeper and more diversified enhancements to the input feature maps of the self‐distillation module, allowing the entire network architecture to acquire a broader range of knowledge. Finally, we employ feature fusion to merge the enhanced features and generate a high‐performance virtual teacher to guide the training of the student model. Extensive evaluations on the CIFAR‐10, CIFAR‐100, and CINIC‐10 datasets demonstrate that our proposed method can significantly enhance performance compared to state‐of‐the‐art feature fusion‐based online knowledge distillation methods. Our code can be found at https://github.com/JSJ515-Group/DFEF-Liu.

Full Text