Abstract

In this paper, we explore the idea of distilling small networks for object detection task. More specifically, we propose a two-stage approach to learn more compact and efficient detectors under the single-shot object detection framework by leveraging knowledge distillation. During the 1st stage, we learn the feature maps of the student model for each of the prediction head from the teacher model. Instead of fitting the whole feature map directly, here we propose the mask guided structure including not only the entire feature map (i.e. global features) but also region features covered by the object (i.e. local features), which can significantly improve the performance of the student network. For the 2nd stage, the ground-truth is used to further refine the performance. Experimental results on PASCAL VOC and KITTI dataset demonstrate the effectiveness of our proposed approach. We achieve 56.88% mAP on VOC2007 at 143 FPS with the backbone of 1/8 VGG16.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call