Channel-wise Knowledge Distillation for Dense Prediction

Changyong Shu,Jianfei Gao,Zheng Yan,Chunhua Shen,Yifan Liu

doi:10.1109/iccv48922.2021.00526

Abstract

Knowledge distillation (KD) has been proven a simple and effective tool for training compact dense prediction models. Lightweight student networks are trained by extra supervision transferred from large teacher networks. Most previous KD variants for dense prediction tasks align the activation maps from the student and teacher network in the spatial domain, typically by normalizing the activation values on each spatial location and minimizing point-wise and/or pair-wise discrepancy. Different from the previous methods, here we propose to normalize the activation map of each channel to obtain a soft probability map. By simply minimizing the Kullback–Leibler (KL) divergence between the channel-wise probability map of the two networks, the distillation process pays more attention to the most salient regions of each channel, which are valuable for dense prediction tasks.We conduct experiments on a few dense prediction tasks, including semantic segmentation and object detection. Experiments demonstrate that our proposed method outperforms state-of-the-art distillation methods considerably, and can require less computational cost during training. In particular, we improve the RetinaNet detector (ResNet50 backbone) by 3.4% in mAP on the COCO dataset, and PSPNet (ResNet18 backbone) by 5.81% in mIoU on the Cityscapes dataset. Code is available at: https://git.io/Distiller

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Channel-wise Knowledge Distillation for Dense Prediction

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

DATNet: Dense Auxiliary Tasks for Object Detection
Alex Levinshtein ... Konstantinos G Derpanis
-
Alex Levinshtein, et. al.Alex Levinshtein ... Konstantinos G Derpanis
01 Mar 2020
01 Mar 2020

Rethinking Local and Global Feature Representation for Dense Prediction
Mohan Chen ... Jianfeng Feng
Pattern Recognition | VOL. 135
Mohan Chen, et. al.Mohan Chen ... Jianfeng Feng
12 Nov 2022
Pattern Recognition | VOL. 135

Structured Knowledge Distillation for Dense Prediction.
Yifan Liu ... Changyong Shu
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45
Yifan Liu, et. al.Yifan Liu ... Changyong Shu
12 Jun 2020
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45

GAN-Knowledge Distillation for One-Stage Object Detection
Wanwei Wang ... Jinke Yu
IEEE Access | VOL. 8
Wanwei Wang, et. al.Wanwei Wang ... Jinke Yu
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Channel-wise Knowledge Distillation for Dense Prediction

Abstract

Talk to us

Similar Papers