Abstract

In order to tackle the issue of multi-scale object detection, recent detectors usually adopt hierarchical feature pyramids which are generated by naive combinations of top-down features and lateral features. Considering the limited effective receptive fields of the methods for top-down features augmentation, the generated regions are only associated with the fixed areas of the coarser features. Meanwhile, noisy features are introduced by irrelevant regions inevitably since the finer features in relation to rigid coarser regions. Thus, the pyramidal features with strong semantics are difficult to be obtained via simply enlarging the top-down features. In this paper, we present the Aggregated Residual Dilation based Feature Pyramid Network (ARDFPN) to exploit the inherent correlation of regions in feature pyramid. The network is designed by stacking a building block that aggregates a set of dilated convolutions with the same topology. We show that carefully adding additional transformation stages into feature pyramid enables a potential way for further multi-scale feature generation. As an intuitive extension of Feature Pyramid Network (FPN), we conduct an exhaustive study to evaluate the model performance by replacing FPN with the proposed ARDFPN in both object detection and instance segmentation tasks. With Residual network in Faster R-CNN and Mask R-CNN framework, ARDFPN outperforms the prevalent detection module - FPN on the challenging COCO dataset without bells and whistles. In particular, ARDFPN exhibits a superior performance, especially for the small and middle objects.

Highlights

  • In the last decades, Convolutional Neural Networks (CNNs) based methods [1]–[5] have proven to be effective in object detection task [6]

  • We present a fully convolutional network — Aggregated Residual Dilation based Feature Pyramid Network (ARDFPN), which leverages the region correlation in each pyramidal feature generation block based on the split-transform-merge philosophy

  • Regardless of the backbone convolutional architectures, we empirically demonstrate that ARDFPN outperforms the original FPN module in both object detection and instance segmentation tasks by using the information of hierarchical feature maps more efficiently

Read more

Summary

INTRODUCTION

Convolutional Neural Networks (CNNs) based methods [1]–[5] have proven to be effective in object detection task [6]. Considering the weak semantics of the hand-crafted features, recent deep learning based object detectors start to pay attention on multi-scale detection by incorporating hierarchical feature maps into networks [11]–[14], [26]–[29]. The RFB module makes use of an aggregated spatial array of the receptive field, which is built by multi-branch convolution kernels with different dilated convolutional layers, to simulate the properties of pRFs. For the sake of enlarging the receptive field while keeping the spatial size of feature maps, Li et al [39] introduced a specific backbone network named DetNet by adopting dilated convolutions, which boosts the detection accuracy of large objects significantly.

AGGREGATED DILATION BLOCK
TRANSPOSED RESIDUAL LEARNING
EXPERIMENTS
INSTANCE SEGMENTATION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call