Abstract
Object detection remains a challenging task in computer vision due to the tremendous extent of changes in the appearances of objects caused by clustered backgrounds, occlusion, truncation, and scale change. Current deep neural network (DNN)-based object detection methods cannot simultaneously achieve a high accuracy and a high efficiency. To overcome this limitation, in this paper, we propose a novel multi-scale attention (MSA) DNN for accurate object detection with high efficiency. The proposed MSA-DNN method utilizes a novel multi-scale feature fusion module (MSFFM) to construct high-level semantic features. Subsequently, a novel MSA module (MSAM) based on the fused layers of the MSFFM is introduced to exploit the global semantic information of image-level labels to guide detection. On the one hand, MSAM can capture global semantic information to further enhance the semantic feature representation of the fused layers constructed by the MSFFM, thereby improving the detection accuracy. On the other hand, the MSA maps generated by MSAM can be employed to rapidly and coarsely locate objects at different scales. In addition, an attention-based hard negative mining strategy is introduced to filter out negative samples to reduce the search space, dramatically alleviating the severe class imbalance problem. Extensive experimental results on the challenging PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO datasets demonstrate that MSA-DNN achieves a state-of-the-art detection accuracy while maintaining a high efficiency. Furthermore, MSA-DNN significantly improves the small-object detection accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.