Integrating knowledge distillation of multiple strategies

Jindong Min,Mingxia Wang,Yingfa Lu,Changbo Cheng

doi:10.1117/12.2662513

Abstract

With the increase of the complexity of the real visual target detection task and the improvement of the recognition accuracy, all aspects of the target detection network model have also become more complex and huge, causing difficult deployment and poor inference timelines. In this paper, knowledge distillation is used to compress the huge and complex deep neural network model to another lightweight network model. Different from traditional knowledge distillation methods, we propose a novel knowledge distillation that incorporates multi-faceted features, called M-KD. In this paper, when training and optimizing the deep neural network model for target detection, the diverse knowledge of the teacher network is transferred to the student network. Moreover, we also introduce an intermediate an intermediate guidance layer between teacher network and the student network to make up for the huge difference between them. Finally, this paper adds an exploration module to the traditional knowledge distillation teacher-student network model. Lastly, comprehensive experiments in this paper using different distillation parameter configurations demonstrate that our proposed new network model achieves substantial improvements in speed and accuracy performance.

Full Text