Dense Feature Fusion for Online Mutual Knowledge Distillation

Dongsheng Ni

doi:10.1088/1742-6596/1865/4/042084

Abstract

Feature maps contain rich information about image intensity and spatial correlation. However, previous online knowledge distillation methods only utilize the class probabilities, Ignoring the middle-level supervision, resulting in low efficiency in training many models. Even if some methods have joined the middle-level supervision, the previous way is to define the characteristic loss, and the effect is general. We propose a new method of middle-level supervision, through the fusion of features between teacher academic network to enter the supervision. The specific method is to fuse the features of the teacher academic network, and establish an auxiliary fusion branch to process the fusion information, so that the feature fusion can effectively strengthen the feature interaction of the teacher academic network. At the same time, we added the normalized integrated distillation of the output, and our method reached the SOTA in online KD. We have done a lot of experiments on cifar-10, cifar-100 and ImageNet datasets, and proved that this method is more effective than other methods in the performance of sub network and fusion classifier, as well as generating meaningful feature maps.

Full Text