Abstract

Existing studies on knowledge distillation typically focus on teacher-centered methods, in which the teacher network is trained according to its own standards before transferring the learned knowledge to a student one. However, due to differences in network structure between the teacher and the student, the knowledge learned by the former may not be desired by the latter. Inspired by human educational wisdom, this paper proposes a Student-Centered Distillation (SCD) method that enables the teacher network to adjust its knowledge transfer according to the student network's needs. We implemented SCD based on various human educational wisdom, e.g., the teacher network identified and learned the knowledge desired by the student network on the validation set, and then transferred it to the latter through the training set. To address the problems of current deficiency knowledge, hard sample learning and knowledge forgetting faced by a student network in the learning process, we introduce and improve Proportional-Integral-Derivative (PID) algorithms from automation fields to make them effective in identifying the current knowledge required by the student network. Furthermore, we propose a curriculum learning-based fuzzy strategy and apply it to the proposed PID control algorithm, such that the student network in SCD can actively pay attention to the learning of challenging samples after with certain knowledge. The overall performance of SCD is verified in multiple tasks by comparing it with state-of-the-art ones. Experimental results show that our student-centered distillation method outperforms existing teacher-centered ones.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call