Abstract

Knowledge distillation (KD), as an efficient and effective model compression technique, has received considerable attention in deep learning. The key to its success is about transferring knowledge from a large teacher network to a small student network. However, most existing KD methods consider only one type of knowledge learned from either instance features or relations via a specific distillation strategy, failing to explore the idea of transferring different types of knowledge with different distillation strategies. Moreover, the widely used offline distillation also suffers from a limited learning capacity due to the fixed large-to-small teacher-student architecture. In this article, we devise a collaborative KD via multiknowledge transfer (CKD-MKT) that prompts both self-learning and collaborative learning in a unified framework. Specifically, CKD-MKT utilizes a multiple knowledge transfer framework that assembles self and online distillation strategies to effectively: 1) fuse different kinds of knowledge, which allows multiple students to learn knowledge from both individual instances and instance relations, and 2) guide each other by learning from themselves using collaborative and self-learning. Experiments and ablation studies on six image datasets demonstrate that the proposed CKD-MKT significantly outperforms recent state-of-the-art methods for KD.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.