Abstract

The current most advanced CNN-based detection models are nearly not deployable on mobile devices with limited arithmetic power due to problems such as too many redundant parameters and excessive arithmetic power required, and knowledge distillation as a potentially practical model compression approach can alleviate this limitation. In the past, feature-based knowledge distillation algorithms focused more on transferring the local features customized by people and reduced the full grasp of global information in images. To address the shortcomings of traditional feature distillation algorithms, we first improve GAMAttention to learn the global feature representation in images, and the improved attention mechanism can minimize the information loss caused by processing features. Secondly, feature shifting no longer defines manually which features should be shifted, a more interpretable approach is proposed where the student network learns to emulate the high-response feature regions predicted by the teacher network, which increases the end-to-end properties of the model, and feature shifting allows the student network to simulate the teacher network in generating semantically strong feature maps to improve the detection performance of the small model. To avoid learning too many noisy features when learning background features, these two parts of feature distillation are assigned different weights. Finally, logical distillation is performed on the prediction heads of the student and teacher networks. In this experiment, we chose Yolov5 as the base network structure for teacher–student pairs. We improved Yolov5s through attention and knowledge distillation, ultimately achieving a 1.3% performance gain on VOC and a 1.8% performance gain on KITTI.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call