Abstract

Because small targets have fewer pixels and carry fewer features, most target detection algorithms cannot effectively use the edge information and semantic information of small targets in the feature map, resulting in low detection accuracy, missed detections, and false detections from time to time. To solve the shortcoming of insufficient information features of small targets in the RetinaNet, this work introduces a parallel-assisted multi-scale feature enhancement module MFEM (Multi-scale Feature Enhancement Model), which uses dilated convolution with different expansion rates to avoid multiple down sampling. MFEM avoids information loss caused by multiple down sampling, and at the same time helps to assist shallow extraction of multi-scale context information. Additionally, this work adopts a backbone network improvement plan specifically designed for target detection tasks, which can effectively save small target information in high-level feature maps. The traditional top-down pyramid structure focuses on transferring high-level semantics from the top to the bottom, and the one-way information flow is not conducive to the detection of small targets. In this work, the auxiliary MFEM branch is combined with RetinaNet to construct a model with a bidirectional feature pyramid network, which can effectively integrate the strong semantic information of the high-level network and high-resolution information regarding the low level. The bidirectional feature pyramid network designed in this work is a symmetrical structure, including a top-down branch and a bottom-up branch, performs the transfer and fusion of strong semantic information and strong resolution information. To prove the effectiveness of the algorithm FE-RetinaNet (Feature Enhancement RetinaNet), this work conducts experiments on the MS COCO. Compared with the original RetinaNet, the improved RetinaNet has achieved a 1.8% improvement in the detection accuracy (mAP) on the MS COCO, and the COCO AP is 36.2%; FE-RetinaNet has a good detection effect on small targets, with APs increased by 3.2%.

Highlights

  • The task of object detection has always been one of the main tasks in the field of computer vision

  • Compared with the original RetinaNet, the improved RetinaNet has achieved a 1.8% improvement in the detection accuracy on the MS COCO, and the COCO AP is 36.2%; FE-RetinaNet has a good detection effect on small targets, with APs increased by 3.2%

  • RetinaNet [11] add the shallow layer to the deep layer with strong semantics, since the small object target has disappeared in the deep feature map, a large part of the semantic information will still be lost

Read more

Summary

Introduction

The task of object detection has always been one of the main tasks in the field of computer vision. By fusing high-level strong semantic feature maps and low-level high-resolution feature maps, high-level semantic information is transferred to the lower layers, and the accuracy of the network is improved by fusing multi-scale information. We propose a simple and effective parallel multi-scale feature enhancement module, which can expand the characteristics of the receptive field without down sampling by using dilated convolution and assist the backbone network to extract shallow features with multi-scale context information. We introduce a method to improve the backbone network for the target detection task, which effectively reduces the gap between the feature extraction network in the detection task and the classification task This method enables the high-level feature map of the backbone network to preserve the texture information of small targets as much as possible while preserving large receptive fields and strong semantic information. Unlike most bidirectional structures that reuse the backbone network to extract features in additional branches, this article uses the multi-scale feature enhancement module as the input of the additional branches, which brings brand-new feature information to the network

Related Work
Method
Overall Architecture
Improved
Improved Backbone Network ResNet-D
Dilated
Based the original
Multi-Scale Feature Enhancement Module
Bidirectional Feature Pyramid Network
Experiments
Datasets and Evaluation Metrics
Experiments on COCO Object Detection
Ablation Research
Small Target Detection Performance Comparision
Findings
Comparisons
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call