Multi-Scale Receptive Field Detection Network

Haoren Cui,Zhihua Wei

doi:10.1109/access.2019.2942077

Haoren Cui, Zhihua Wei

Open Access

https://doi.org/10.1109/access.2019.2942077

Copy DOI

Abstract

Deep convolutional neural networks have contributed much to various computer vision problems including object detection. However, there are still many problems to be solved. Scale variation across object instances is one of the major challenges for object detection. In this paper, we propose a multi-scale receptive field detection network (MS-RFDN), a one-stage approach to detect objects of different scales in the image. The proposed network combines predictions of different scales from feature maps of different scales and receptive fields. To generate s scale-specific feature maps in specific layer, we design a scale-specific concatenation module (SSC module). This scale-specific feature maps are merged from the dense block and dilated block, which has the same size of the receptive field. Through our multi-scale layer network structure and scale-specific feature maps, our model has a significant improvement in small object detection. On the VOC 2007 test dataset, our method almost achieves the effect of the state-of-the-art one-stage methods, which confirmed the effectiveness of our model.

Highlights

In recent years, a lot of progress has been made in object detection due to the emergence of deep convolutional neural networks (CNNs)
To alleviate the problem arising from scale variation, multiple solutions have been proposed
We propose a novel framework to deal with scale variation for object detection

Summary

INTRODUCTION

A lot of progress has been made in object detection due to the emergence of deep convolutional neural networks (CNNs). The SSD [6] attempt to use the CNNs pyramidal feature hierarchy, which detects objects of different scales at each feature layer. Most of the object detection frameworks deal with the challenge of scale variation by constructing feature maps of different size of receptive field. Most object detection frameworks extract feature maps with the different size of receptive fields through convolution and pooling operations. We propose a scale-specific concatenation module to merge feature map with the similar size of receptive field by different ways. This module is simple-yet-effective to implement and work well with DenseNet and other backbones. These proven techniques would help us improve the effect of our model

EXPERIMENT

Findings

CONCLUSION