Abstract

The shallow feature map of the single-shot detector (SSD) is not always conducive to enhancing the recognition precision for a small object because of the lack of contextual information. In this research, a single-shot detection algorithm based on cyclic attention (CA-SSD) is proposed to construct a fast and accurate detector that efficiently obtains full-image contextual information. Our network is constructed by integrating ResNet-34 and proposed novel cyclic attention blocks. This type of building block aggregates different transformations, one of which includes an attention module that uses a long but narrow pooling kernel to acquire horizontal and vertical contextual information for each pixel of all pixels. Each pixel eventually captures the full-image dependencies by following an even further cyclic operation. Our design considers the variability of the gradient, which not only improves the reliability of the cyclic attention block but also cuts the number of parameters for computation. Additionally, by exploring the effects of the stem block and its stride on the performance of ResNet-based SSD algorithms, our network retains more detailed information. For an input size of 300 × 300, CA-SSD attained 82.5% mAP on PASCAL VOC 2007 test, 78.4% mAP on PASCAL VOC 2012 test, and 32.7% mAP on MS COCO. Experimental results achieved with CA-SSD surpass the best results achieved with the traditional SSD and other advanced object detection algorithms while real-time speed is maintained.

Highlights

  • Object detection is a fundamental task in the field of computer vision and is critical in many recently developed applications, such as autonomous driving [1], [2], fault detection [3], [4], and medical decision-making [5], [6]

  • To enhance the expressive capability of shallow feature maps, a single-shot detection algorithm based on cyclic attention (CA-single-shot detector (SSD)) is proposed and a novel module, namely the cyclic attention block (CA block), is designed

  • We report extensive experiments on the PASCAL VOC 2007 and PASCAL VOC 2012 datasets, demonstrating that our CA-SSD has greatly improved detection of small objects compared with the conventional SSD and outperforms many detectors based on ResNet-101, such as R-FCN [18] and Faster R-convolutional neural networks (CNNs) [13]

Read more

Summary

INTRODUCTION

Object detection is a fundamental task in the field of computer vision and is critical in many recently developed applications, such as autonomous driving [1], [2], fault detection [3], [4], and medical decision-making [5], [6]. Hu et al.: Single-Shot Detection Based on Cyclic Attention better trade-off between accuracy and speed It utilizes feature maps in various layers to predict objects at distinct scales. The SSD [20] uses multiple feature maps of different sizes and makes each layer concentrate on detecting targets of a specific size It uses two 3 × 3 convolutional layers to calculates the category confidence and position offset of the default bounding boxes and adopts non-maximum suppression (NMS) to filter the redundant boxes in the final prediction results. It means that each position on the feature map Q aggregates global contextual information At this time, the output obtained by x1 after a 3 × 3 convolutional layer is concatenated with Q. The experimental section below provides more parameter settings, such as the batch size and learning rate

EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.