Abstract
Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational knowledge distillation methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance and thus introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multi-level attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular knowledge distillation datasets for image classification, image retrieval, and person re-identification, where the experimental results demonstrate the effectiveness of the proposed method for relational knowledge distillation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have