Abstract

Object detection becomes a challenge due to diversity of object scales. In general, modern object detectors use feature pyramid to learn multi-scale representation for better results. However, current versions of feature pyramid are insufficient to handle the scale imbalance, as it is inefficient to integrate the semantic information across different scales. In this paper, we reformulate the feature pyramid construction as a feature reconfiguration process. Finally, we propose a novel detection network, Multi-level Refinement Feature pyramid Network (MRFPN), to combine the high-level features (i.e., semantic information), middle-level feature and low-level feature (i.e., boundary information), in a highly-nonlinear yet efficient manner. In particular, a novel contextual features module (chain parallel pooling) is proposed, which consists of global attention and local reconfigurations. It efficiently gathers task-oriented contextual features across different scales and spatial locations (i.e., lightweight local reconfiguration and global attention). To evaluate significance of proposed model, we designed and trained a robust end-to-end single stage detector called MRFDet by assimilating it into a conventional SSD model, and it achieved better detection performance compared to most recent single-stage objects detectors. In particular, MRFDet achieves an AP of 45.2 with MS-COCO and an improvement in the map of 4.5% with VOC compared to conventional SSD. We are releasing the source code for our proposed model MRFDet, to facilitate the research community.

Highlights

  • Object detection becomes more challenging as the scale of object instances varies [1,2,3]

  • The main contributions of this work are summarized as follows: 1) We proposed a multi-level refinement feature pyramid network (MRFPN) for object detection with less computational complexity

  • 2) For the first time, Chained Parallel Pooling has been used during the construction phase of the feature pyramid to introduce more robustness and able to capture the contextual information from a vast image region, followed by prediction layers for object detection

Read more

Summary

INTRODUCTION

Object detection becomes more challenging as the scale of object instances varies [1,2,3]. The main contributions of this work are summarized as follows: 1) We proposed a multi-level refinement feature pyramid network (MRFPN) for object detection with less computational complexity It exploits the features from multiple levels and recursively refines the shallow features to generate a middle level and more in-depth feature maps. 2) For the first time, Chained Parallel Pooling has been used during the construction phase of the feature pyramid to introduce more robustness and able to capture the contextual information from a vast image region, followed by prediction layers for object detection For this purpose, the features are efficiently pool with several window sizes and merged with learnable weights and residual connections. MRFDet can be used for both datasets; i.e., PASCAL VOC 07/12 and MS COCO achieve state-of-the-art performance

RELATED WORK
4) OBJECTIVE LOSS FUNCTION
DISSCUSSION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call