Abstract

As a simple yet effective model compression method, knowledge distillation (or KD) is used to learn a small lightweight student network by transferring valuable knowledge from a pre-trained cumbersome teacher network. However, existing KD methods usually consider the feature knowledge either in different layers or individual samples, failing to explore more detailed information in different channels from the perspective of sample relationships. Meanwhile, the negative influences contained in the teacher knowledge are also not well investigated especially when using the response-based knowledge. To address the above-mentioned issues, we devise a novel knowledge distillation approach entitled channel correlation-based selective knowledge distillation (or CCSKD). Specifically, to distill rich knowledge from feature representations, we not only consider the feature knowledge from different channels for individual samples, but also take into account the relational knowledge based on per-channel features for different samples. Furthermore, to further distill positive response-based knowledge, a selective strategy is developed, i.e., selective knowledge distillation, to progressively correct the negative influences from the teacher knowledge during the distillation process. We perform extensive experiments on three image classification datasets, CIFAR-100, Stanford Cars, and Tiny-ImageNet, to demonstrate the effectiveness of the proposed CCSKD, which outperforms recent state-of-the-art methods with a clear margin. Our codes are publicly available at https://github.com/gjplab/CCSKD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call