Abstract

Scale variation is one of the important challenges in object detection. Many state-of-the-art objectors tackle this problem by utilizing the feature pyramids. However, the current methods of producing feature pyramids are still inefficient to integrate the semantic information from other layers. In this work, our motivation is to build a feature pyramid efficiently with the selected contextual feature by integrating the informative features and suppressing the useless ones. To achieve this goal, we propose a novel single-stage detection network termed Selective Feature Network(SFNet) which consists of a semantic-enhanced module and a selective feature module. The semantic-enhanced module improves the semantics of basic pyramids via a lightweight architecture. In conjunction with that, a selective feature module is employed to combine features across different channels and scales by attention mechanism. The resulting contextual feature is then injected into the pyramidal features. Comprehensive experiments are performed on PASCAL VOC and MS COCO datasets. Results demonstrate that, with a VGG16 based SFNet, our approach obtains significant improvements over the competitors without losing real-time processing speed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call