Abstract

Knowledge distillation is an effective model compression technology. As the main research branch, the knowledge distillation based on the middle layer focuses on extracting the network representation knowledge in the hidden layer to improve student learning performance. To fully utilize the process knowledge of the teacher model, an algorithm of the feature reconstruction based on middle-layer feature maps is proposed, which embeds middle-layer feature maps of the student network into the teacher network group by group. We introduce the intermediate soft-hard label contained process knowledge of the network with the help of pre-trained module group of the teacher network in loss function. This can increase the coupling of the teacher-student network in the process of knowledge transfer and improve the intrinsic knowledge inheritance of the compressed student model to the teacher network. The experimental results on the CIFAR-10 and CIFAR-100 verify the effectiveness and scalability of this method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call