Abstract

Recently, atrous spatial pyramid module and encoder-decoder structure are widely investigated for many computer vision tasks to address the scale variation challenge in deep convolutional networks. The atrous spatial pyramid module aims at capturing local and global cues together with various sampling rates in convolutional or pooling layers. At the same time, encoder-decoder structure propagates context features from low-resolution, strong-semantic features to high-resolution, weak-semantic ones, while maintaining the detail object boundaries. However, the aforementioned two strategies have their own drawbacks, and the previous object detectors only employ one of them to handle scale variation. In this work, we propose to cooperate atrous spatial pyramid convolution (ASPC) with encoder-decoder structure (ED) for object detection, termed ASPC-ED, which combines the complementary advantages from both modules in an end-to-endfashion. Specifically, the proposed method is consist of three components: encoder, ASPC and decoder. The extensive experiments on PASCAL VOC and MS COCO benchmarks demonstrate that our method with various backbone achieves the state-of-the-art results for object detection, instance segmentation and panoptic segmentation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call