Abstract

To reduce the computational consumption and memory footprint of powerful deep neural networks for applications on edge devices, many model compression methods have been proposed. Among them, knowledge distillation, a topic that has been widely studied, aims at training a lightweight and powerful student network under the guidance of a well-trained but cumbersome teacher network. Most existing knowledge distillation methods focus on how to better define knowledge and the student is forced to mimic the representation space of the teacher, ignoring leveraging the powerful teacher architecture to improve the student intermediate representations for approaching the proficient performance of the teacher. Since the teacher naturally has more powerful feature extraction ability and is always available at the training process of the student, we introduce Relay Knowledge Distillation (ReKD) for efficiently boosting the student performance. Concretely, ReKD transfers the student intermediate representations to the teacher, making the student trained with the teacher in an interactive manner. By directly utilizing the teacher powerful feature extraction ability to improve the student intermediate representations, the knowledge of the teacher is implicitly distilled to the student. Extensive experiments on diverse image classification datasets demonstrate the proposed ReKD can significantly improve the student performance, even making it possible for the student to outperform the teacher.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call