Abstract

Several state-of-the-art object detectors have demonstrated outstanding performances by optimizing feature representation through modification of the backbone architecture and exploitation of a feature pyramid. To determine the effectiveness of this approach, we explore the modification of object detectors’ backbone and feature pyramid by utilizing Neural Architecture Search (NAS) and Capsule Network. We introduce two modules, namely, NAS-gate convolutional module and Capsule Attention module. The NAS-gate convolutional module optimizes standard convolution in a backbone network based on differentiable architecture search cooperation with multiple convolution conditions to overcome object scale variation problems. The Capsule Attention module exploits the strong spatial relationship encoding ability of the capsule network to generate a spatial attention mask, which emphasizes important features and suppresses unnecessary features in the feature pyramid, in order to optimize the feature representation and localization capability of the detectors. Experimental results indicate that the NAS-gate convolutional module can alleviate the object scale variation problem and the Capsule Attention network can help to avoid inaccurate localization. Next, we introduce NASGC-CapANet, which incorporates the two modules, i.e., a NAS-gate convolutional module and capsule attention module. Results of comparisons against state-of-the-art object detectors on the MS COCO val-2017 dataset demonstrate that NASGC-CapANet-based Faster R-CNN significantly outperforms the baseline Faster R-CNN with a ResNet-50 backbone and a ResNet-101 backbone by mAPs of 2.7% and 2.0%, respectively. Furthermore, the NASGC-CapANet-based Cascade R-CNN achieves a box mAP of 43.8% on the MS COCO test-dev dataset.

Highlights

  • Several state-of-the-art object detectors have demonstrated outstanding performances by optimizing feature representation through modification of the backbone architecture and exploitation of a feature pyramid

  • We proposed the Neural Architecture Search (NAS)-gate convolutional module, which utilized the NAS operation based on differentiable architecture search (DARTS) with multiple kernel sizes and dilation rates for the convolutional operation of the classification backbone network to decrease the computation cost of NAS-based backbones and alleviate the issues arising from the object scale variation

  • In order to mitigate the problems arising from object scale variation, we optimized the feature extractor ability of the backbone by replacing the standard convolution of the classification backbone network with the proposed module, i.e., a NAS-gate convolutional module based on Neural Architecture search method, to increase the detection performance on the multiscale objects in the images with smaller computation cost compared to the NAS-based object detectors backbones

Read more

Summary

Introduction

Several state-of-the-art object detectors have demonstrated outstanding performances by optimizing feature representation through modification of the backbone architecture and exploitation of a feature pyramid. There have been several attempts to alleviate the issues arising from scale variation and instances of small objects in object detection, such as proposing new backbone architectures that maintain a high spatial resolution in the deep l­ayers[31–33], modification of convolution by utilizing Atrous c­ onvolution[26], and adoption of an attention m­ echanism[34]. These approaches have achieved considerably higher detection performance. Adopting the capsule attention at the highest level of FPN or FPN-based methods can alleviate the information loss problem without losing spatial relationships, improving the localization ability

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.