Hierarchical Multi-Attention Transfer for Knowledge Distillation

Jianping Gou,Liyuan Sun,Baosheng Yu,Dacheng Tao,Shaohua Wan

doi:10.1145/3568679

Abstract

Knowledge distillation (KD) is a powerful and widely applicable technique for the compression of deep learning models. The main idea of knowledge distillation is to transfer knowledge from a large teacher model to a small student model, where the attention mechanism has been intensively explored in regard to its great flexibility for managing different teacher-student architectures. However, existing attention-based methods usually transfer similar attention knowledge from the intermediate layers of deep neural networks, leaving the hierarchical structure of deep representation learning poorly investigated for knowledge distillation. In this paper, we propose a hierarchical multi-attention transfer framework (HMAT) , where different types of attention are utilized to transfer the knowledge at different levels of deep representation learning for knowledge distillation. Specifically, position-based and channel-based attention knowledge characterize the knowledge from low-level and high-level feature representations, respectively, and activation-based attention knowledge characterize the knowledge from both mid-level and high-level feature representations. Extensive experiments on three popular visual recognition tasks, image classification, image retrieval, and object detection, demonstrate that the proposed hierarchical multi-attention transfer or HMAT significantly outperforms recent state-of-the-art KD methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hierarchical Multi-Attention Transfer for Knowledge Distillation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Sep 27, 2023
Citations: 25

Similar Papers

Contrastive Deep Supervision
Linfeng Zhang ... Runpei Dong
-
Linfeng Zhang, et. al.Linfeng Zhang ... Runpei Dong
01 Jan 2021
01 Jan 2021

Revisiting knowledge distillation for light-weight visual object detection
Tianze Gao ... Yunfeng Gao
Transactions of the Institute of Measurement and Control | VOL. 43
Tianze Gao, et. al.Tianze Gao ... Yunfeng Gao
13 Aug 2021
Transactions of the Institute of Measurement and Control | VOL. 43

CoEdge: Exploiting the Edge-Cloud Collaboration for Faster Deep Learning
Liangyan Hu ... Guodong Sun
IEEE Access | VOL. 8
Liangyan Hu, et. al.Liangyan Hu ... Guodong Sun
01 Jan 2020
IEEE Access | VOL. 8

Person search over security video surveillance systems using deep learning methods: A review
S Irene ... V Rhymend Uthariaraj
Image and Vision Computing | VOL. 143
S Irene, et. al.S Irene ... V Rhymend Uthariaraj
12 Feb 2024
Image and Vision Computing | VOL. 143

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical Multi-Attention Transfer for Knowledge Distillation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications