Multilevel Attention-Based Sample Correlations for Knowledge Distillation

Jianping Gou,Weihua Ou,Zhang Yi,Liyuan Sun,Shaohua Wan,Baosheng Yu

doi:10.1109/tii.2022.3209672

Jianping Gou, Weihua Ou + Show 4 more

https://doi.org/10.1109/tii.2022.3209672

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational knowledge distillation methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance and thus introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multi-level attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular knowledge distillation datasets for image classification, image retrieval, and person re-identification, where the experimental results demonstrate the effectiveness of the proposed method for relational knowledge distillation.

Full Text