Abstract

Traditional knowledge distillation approaches are typically designed for specific tasks, as they primarily distilling deep features from intermediate layers of a neural network, generally with ingeniously designed knowledge representations, which increase the difficulty of model development and interpretation. In contrast, we empirically show that a soft logits knowledge distillation is enough to significantly narrow the teacher–student model performance gap. In this paper, the Soft Hybrid Knowledge Distillation (SHKD) is proposed to generate representative features from the input and output location of models, and applicable to various teacher and student architectures. Specifically, the additional mixup task is first applied to the same batch of training images, which enriches the high-order feature information transferred by the model. And then, a top-k guided specification filter is added to the teacher model, to reduce the bias induced throughout the knowledge transfer process. Finally, the SHKD utilizes the same and different teacher–student model pairs of the deep neural network, which can be easily adjusted to the computational resources available in the current world. Extensive experiments are conducted to examine the efficiency, and show that the SHKD outperforms state-of-the-art knowledge distillation methods. With the SHKD, the high performance of models will be maintained, and the gap between the teacher model and the student model can be reduced, making the construction of deep neural networks more convenient to the unknown scenarios. The source code is available at https://github.com/lambett/SHKD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call