Abstract

Knowledge distillation is an emerging method for acquiring efficient, small-scale networks. The main idea is to transfer knowledge from a complex teacher model with high learning capacity to a simple student model. To this end, various approaches to knowledge distillation have been proposed in the past few years, focusing mainly on modifications to student learning styles and less on changes to teacher teaching styles. Therefore, our new approach to knowledge distillation teacher training involves adapting the trained teachers to the knowledge distillation model in order to minimize the gap between the student model and the teacher model. We introduced the idea of a “Trained Teacher”: Our approach involves using a specially trained teacher network that, by incorporating knowledge distillation constraints during its own training, adapts to the teaching model in advance and performs nearly identically to a typical teacher network. This allows students to absorb the teacher's knowledge more effectively, thereby increasing their competence. In addition, the methods of mainstream knowledge distillation currently in use are equally appropriate to our educated teachers. Extensive tests on numerous datasets reveal that our technique outperforms the original knowledge distillation in accuracy on standard KD by 2%. Our code and pre-trained models can be found at https://github.com/JSJ515-Group/Trained_teacher.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.