Abstract

In recent years, the single-stage detectors have been developed rapidly; however, compared with the multi-stage detectors, their detection precision is still relatively low. Single-stage detectors and multi-stage detectors are analyzes and compared in detail in this paper, which reveals that single-stage detectors suffer from some problems, including feature loss and inaccurate feature extraction. Therefore, this paper proposes a novel detection model, dubbed Optimized Network (OptNet), to alleviate these deficiencies. OptNet consists of three modules: pyramid of attention features, feature alignment and consistency supervision (CS). The pyramid of attention features, based on feature pyramid networks (FPNs), introduces a novel branch named attention FPN (AtFPN), which aggregates the multi-layer features of the backbone network and optimizes the object features by using lightweight attention modules. AtFPN alleviates the loss of the feature pyramid information and the blocking of feature transmission between adjacent layers. Meanwhile, it provides global information for the model. The feature alignment module aligns the anchor box to the feature by using the object location information to guide the network to extract precise object features. Finally, CS accelerates network optimization and reduces semantic differences between the features on different layers. In the detection stage, OptNet optimizes the prediction of the model with the first detection result to improve the accuracy. Experiments on the MS COCO 2017 dataset demonstrate that OptNet yields significant improvement in the detection precision.

Highlights

  • Object detection is one of the basic fields of computer vision

  • To eliminate the influence of multi-scale features on the detector, Optimized Network (OptNet) uses the real label and ground true box of the object in the image to supervise the learning of the feature pyramid

  • In the same experimental environment, the AP of OptNet on different backbone networks is 0.9%~1.1% higher than that of RetinaNet, which demonstrates that OptNet can effectively improve the performance of classification and localization

Read more

Summary

INTRODUCTION

Object detection is one of the basic fields of computer vision. Its core task is to identify and localize the objects of interest in the images. With the rapid development of deep learning in recent years, a number of state-of-the-art detectors have been proposed based on deep learning These object detection algorithms can be briefly divided into two categories: multi-stage detection [1,2,3,4,5] and single-stage detection [6,7,8,9]. Assigned the objects to specific layers for detection These methods improved the precision of the single-stage detector, but compared with the multi-stage detection, the single-stage detection algorithm still has the following disadvantages. Single-stage object detectors typically directly use the object features extracted by CNN to predict the category and location of the anchor box, resulting in a relatively lower detection precision.

RELATED WORK
Method
LOSS FUNCTION
EXPERIMENTS
Findings
CONCLUSIONS AND FURTHER WORKS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.