Abstract

Object detectors based on deep learning can run smoothly on a terminal device in complex traffic scenes, and the model compression method has become a research hotspot. Considering student network single learning in the knowledge distillation algorithm, the dependence on loss function design leads to parameter sensitivity and other problems, we propose a new knowledge distillation method with second-order term attention mechanisms and feature fusion of adjacent layers. First, we build a knowledge distillation framework based on YOLOv5 and propose a new attention mechanism in the teacher network backbone to extract the hot map. Then, we combine the hot map features with the next level features through the fusion module. By fusing the useful information of the low convolution layer and the feature map of the high convolution layer to help the student network obtain the final prediction map. Finally, to improve the accuracy of small objects, we add a 160 × 160 detection head and use a transformer encoder block module to replace the convolution network of the head. Sufficient experimental results show that our method achieves state-of-the-art performance. The speed and number of parameters remain unchanged, but the average detection accuracy is 97.4% on the KITTI test set. On the Cityscapes test set, the average detection accuracy reaches 92.7%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call