Abstract

Single-stage object detectors are quick and highly accurate. Based on the way training model is developed, single-stage object detectors either adopt a training model based on a pre-trained backbone network model, or a model trained from the scratch. The pre-trained backbone network model is associated with the propagation sensitivity both in classification and detection. This leads to deviations in learning goals, and results in an architecture which is limited by the classification network, hence not easy to modify. Training from the scratch is not as efficient as using a pre-trained network, mainly due to the limitations of the predefined network system. In this paper, we combine these two approaches to overcome the above-mentioned shortcomings. In our proposed method a top-down concatenated feature pyramid is built upon a basic FSSD network. The experiments in this paper are conducted on MS COCO and PASCAL VOC data sets. Moreover, we apply VGG16 as the backbone network to further indicate the effectiveness of our proposed method which reaches 33.1 AP on MS COCO benchmark.

Highlights

  • Object detection is a rapidly developing research area as it is used in a wide range of applications

  • There are two types of single-stage object detectors: (i) an object detector based on a pre-trained convolutional neural network, see, e.g., [13, 14, 15], and; (ii) an object detector trained from the scratch

  • In SSD [1] inherits the idea of converting detection into a regression problem as in YOLO, and directly completes target positioning and classification; it is inspired by the anchor in Faster R-CNN[8], and proposes a similar Prior box; by adding FPN, predict the target on the feature map of different receptive fields

Read more

Summary

INTRODUCTION

Object detection is a rapidly developing research area as it is used in a wide range of applications. The pre-training model generally contains a deep convolutional network structure, so the extracted features are relatively abstract and have rich semantic information. The FSSD method is used as a pre-training model, and use the proposed Concatenated Feature Pyramid (CFP) to combine FSSD with the scratch network, so that the high-level semantic information of the deep feature map is extended to the shallow layer of the neural network. We proposed an object detector that combines the scratch network and pre-training model to enrich the semantic information in the middle and shallow layers of the neural network. This improves the detection performance of small objects. Method SSD512 FSSD512 DSOD300 YOLO v3-608 DSSD513[5] RefineDet512 RetinaNet-500 ScratchDet300 RFBNet512 Our300 Our512

BASELINE DETECTION FRAMEWORK
EXPERIMENT
CONCLUSION
RESULTS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call