Abstract

The single-level feature map-based object detection has been a challenging task due to the feature scale limitation. Therefore, enriching multiscale information of single-level features is considered a promising approach to deal with this challenge. Although most existing methods have attempted to augment the feature scale of single-level features, the detection performance is still unsatisfactory because these methods mine multiscale features only based on a one-level feature map. To address this problem, we propose a multiple-in-single-out network (MiSoNet) to integrate multiscale information from multilevel feature maps into a single-level feature map. To achieve this, MiSoNet’s key component is equipped with two cascaded modules: a multilevel feature integration module (MFIM) and a depthwise convolutional residual encoder (DWEncoder). Specifically, MFIM adaptively fuses features of inconsistent semantics and scales from multilevel feature maps. DWEncoder stacks several residual blocks with depthwise convolutions to extract multiscale contexts in the single feature map, which can further extend the scale range of the receptive fields. Extensive experiments are conducted on the Common Objects in Context (COCO) dataset, where the MiSoNet achieves a 41.0AP, which surpasses the YOLOF by 1.4AP with negligible computational overhead. Moreover, the MiSoNet, with fewer parameters and FLOPs, outperforms some advanced detectors based on the feature pyramid network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call