Abstract
Model compression technique is now widely investigated to fit the high-complexity deep neural network into resource-constrained mobile devices in recent years, in which one of effective methods is knowledge distillation. In this paper we make a discussion on the temperature term introduced in knowledge distillation method. The temperature term in distill training is aimed at making it easier for the student network to learn the generalization capablityof teacher network by softening the labels from the teacher network. We analyze the situation of using the temperature term in ordinary training to soften the output of neural network instead of soften the target. In experiments, we show that by applying a proper temperature term in training process, a better performance can be gained on NABirds dataset than using the model without temperature term.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.