Trained teacher: Who is good at teaching

Xingzhu Liang,Feilong Bi,Wen Liu,Xinyun Yan,Chunjiong Zhang,Chenxing Xia

doi:10.1016/j.displa.2023.102543

Abstract

Knowledge distillation is an emerging method for acquiring efficient, small-scale networks. The main idea is to transfer knowledge from a complex teacher model with high learning capacity to a simple student model. To this end, various approaches to knowledge distillation have been proposed in the past few years, focusing mainly on modifications to student learning styles and less on changes to teacher teaching styles. Therefore, our new approach to knowledge distillation teacher training involves adapting the trained teachers to the knowledge distillation model in order to minimize the gap between the student model and the teacher model. We introduced the idea of a “Trained Teacher”: Our approach involves using a specially trained teacher network that, by incorporating knowledge distillation constraints during its own training, adapts to the teaching model in advance and performs nearly identically to a typical teacher network. This allows students to absorb the teacher's knowledge more effectively, thereby increasing their competence. In addition, the methods of mainstream knowledge distillation currently in use are equally appropriate to our educated teachers. Extensive tests on numerous datasets reveal that our technique outperforms the original knowledge distillation in accuracy on standard KD by 2%. Our code and pre-trained models can be found at https://github.com/JSJ515-Group/Trained_teacher.

Full Text