Abstract

The single-level feature map-based object detection has been a challenging task due to the feature scale limitation. Therefore, enriching multiscale information of single-level features is considered a promising approach to deal with this challenge. Although most existing methods have attempted to augment the feature scale of single-level features, the detection performance is still unsatisfactory because these methods mine multiscale features only based on a one-level feature map. To address this problem, we propose a multiple-in-single-out network (MiSoNet) to integrate multiscale information from multilevel feature maps into a single-level feature map. To achieve this, MiSoNet’s key component is equipped with two cascaded modules: a multilevel feature integration module (MFIM) and a depthwise convolutional residual encoder (DWEncoder). Specifically, MFIM adaptively fuses features of inconsistent semantics and scales from multilevel feature maps. DWEncoder stacks several residual blocks with depthwise convolutions to extract multiscale contexts in the single feature map, which can further extend the scale range of the receptive fields. Extensive experiments are conducted on the Common Objects in Context (COCO) dataset, where the MiSoNet achieves a 41.0AP, which surpasses the YOLOF by 1.4AP with negligible computational overhead. Moreover, the MiSoNet, with fewer parameters and FLOPs, outperforms some advanced detectors based on the feature pyramid network.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.