Abstract

The single shot multi-box detector (SSD) exhibits low accuracy in small-object detection; this is because it does not consider the scale contextual information between its layers, and the shallow layers lack adequate semantic information. To improve the accuracy of the original SSD, this paper proposes a new single shot multi-box detector using trident feature and squeeze and extraction feature fusion (SSD-TSEFFM); this detector employs the trident network and the squeeze and excitation feature fusion module. Furthermore, a trident feature module (TFM) is developed, inspired by the trident network, to consider the scale contextual information. The use of this module makes the proposed model robust to scale changes owing to the application of dilated convolution. Further, the squeeze and excitation block feature fusion module (SEFFM) is used to provide more semantic information to the model. The SSD-TSEFFM is compared with the faster regions with convolution neural network features (RCNN) (2015), SSD (2016), and DF-SSD (2020) on the PASCAL VOC 2007 and 2012 datasets. The experimental results demonstrate the high accuracy of the proposed model in small-object detection, in addition to a good overall accuracy. The SSD-TSEFFM achieved 80.4% mAP and 80.2% mAP on the 2007 and 2012 datasets, respectively. This indicates an average improvement of approximately 2% over other models.

Highlights

  • Object detection is an important area of computer vision with numerous applications in several fields such as autonomous driving [1], face detection [2], medical imaging [3], 3D reconstruction [4], optical character recognition [5], and action recognition [6]

  • By integrating every prediction layer with its corresponding deconvolution layer, the contextual information can be injected into shallow layers, which leads to an improvement in the accuracy of small-object detection, as the resolution of the feature maps is enhanced

  • The largest increase in mean average precision (mAP) was observed when the trident feature module (TFM) was applied to Conv7

Read more

Summary

Introduction

Object detection is an important area of computer vision with numerous applications in several fields such as autonomous driving [1], face detection [2], medical imaging [3], 3D reconstruction [4], optical character recognition [5], and action recognition [6]. [16], the SSD performed classification and localization of objects using anchor-boxes of different sizes at multiple scales by extracting different feature maps of various sizes. The model shows relatively low performance in the detection of small objects To address this problem, a module, called the trident feature module (TFM), is proposed. Shallow layers are specialized for detecting small objects via the extraction of feature maps with high resolution. The performance of object detection can be improved by reusing the feature maps In this way, the semantic information of shallow layers can be effectively reinforced. It is compared to the existing object detection models, such as the faster regions with convolution neural network features (RCNN) [12], SSD, and DF-SSD [20]. The proposed model, SSD-TSEFFM, addresses the challenges encountered in the detection of small objects.

SSD Series
Object Detectors for Scale-Variance
Feature Pyramid Network
Proposed Model
Training Setting
Results on
TFM Application Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call