Abstract

Online knowledge distillation breaks the pre-determined strong and weak teacher-student models, it provides a new way of thinking about knowledge distillation. However, the current online methods often use the Logits-based prediction distribution, and the features containing rich semantic information are rarely used. Even if the feature-based methods are used, they only operate on the last layer of the network, without further exploring the representation knowledge of the middle layer feature map. To address the above issues, we propose an innovative feature early fusion and reconstruction (FEFR) method for online knowledge distillation which entails four essential components: multi-scale feature extraction and intermediate layer feature early fusion, reconstruction of features, dual-attention and overall fusion module in this paper. We propose early fusion by “sum” operation for feature matrices between different layers and advance fusion to improve the feature map representation. In order to enhance the communication ability between groups to obtain features, the features were reconstructed. We create a dual-attention to enhance the critical channel and spatial regions adaptively in order to collect more accurate information. The previously processed feature maps are combined and fused using feature fusion, which also aids in student models training. A study of the network architectures of CIFAR-10, CIFAR-100, CINIC-10 and ImageNet 2012 shows that FEFR provides more useful characterization knowledge for refinement and improves accuracy by about 0.5% compared to other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call