A Rich Feature Fusion Single-Stage Object Detector

Kai Zhang,Yasenjiang Musha,Binglong Si

doi:10.1109/access.2020.3037245

Kai Zhang, Yasenjiang Musha + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3037245

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 1	License type: cc-by

Affiliation: Xinjiang University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Single-stage object detectors are quick and highly accurate. Based on the way training model is developed, single-stage object detectors either adopt a training model based on a pre-trained backbone network model, or a model trained from the scratch. The pre-trained backbone network model is associated with the propagation sensitivity both in classification and detection. This leads to deviations in learning goals, and results in an architecture which is limited by the classification network, hence not easy to modify. Training from the scratch is not as efficient as using a pre-trained network, mainly due to the limitations of the predefined network system. In this paper, we combine these two approaches to overcome the above-mentioned shortcomings. In our proposed method a top-down concatenated feature pyramid is built upon a basic FSSD network. The experiments in this paper are conducted on MS COCO and PASCAL VOC data sets. Moreover, we apply VGG16 as the backbone network to further indicate the effectiveness of our proposed method which reaches 33.1 AP on MS COCO benchmark.

Highlights

Object detection is a rapidly developing research area as it is used in a wide range of applications
There are two types of single-stage object detectors: (i) an object detector based on a pre-trained convolutional neural network, see, e.g., [13, 14, 15], and; (ii) an object detector trained from the scratch
In SSD [1] inherits the idea of converting detection into a regression problem as in YOLO, and directly completes target positioning and classification; it is inspired by the anchor in Faster R-CNN[8], and proposes a similar Prior box; by adding FPN, predict the target on the feature map of different receptive fields

Summary

INTRODUCTION

Object detection is a rapidly developing research area as it is used in a wide range of applications. The pre-training model generally contains a deep convolutional network structure, so the extracted features are relatively abstract and have rich semantic information. The FSSD method is used as a pre-training model, and use the proposed Concatenated Feature Pyramid (CFP) to combine FSSD with the scratch network, so that the high-level semantic information of the deep feature map is extended to the shallow layer of the neural network. We proposed an object detector that combines the scratch network and pre-training model to enrich the semantic information in the middle and shallow layers of the neural network. This improves the detection performance of small objects. Method SSD512 FSSD512 DSOD300 YOLO v3-608 DSSD513[5] RefineDet512 RetinaNet-500 ScratchDet300 RFBNet512 Our300 Our512

BASELINE DETECTION FRAMEWORK

EXPERIMENT

CONCLUSION

RESULTS