Abstract

Semantic segmentation, aiming to assign semantic labels to each pixel, is broadly applied into many fields, such as video surveillance, medical image analysis, and autonomous driving. However, there are two challenges in semantic segmentation task: 1) the deficiency of rich contextual information; and 2) the lack of sufficient spatial information, all of which affect segmentation performance seriously. To solve these two challenges, the global feature capturing module (GFCM) and Conv Block are proposed in this paper to build a new model to improve segmentation performance. Specifically, GFCM, made of the global encoding module (GEM) and spatial attention module (SAM), is designed to extract adequate global contextual information and build global spatial dependencies. Composed of three convolution layers, Conv Block is proposed to preserve rich spatial information. Based on GFCM and Conv Block, a new model is designed, where a data-dependent upsampling operator (DUpsampling) is exploited to recover the pixel-wise prediction effectively. The extensive experiments have been made to prove the effectiveness of the design, and the new model achieves 73.69% mIoU on Cityscapes test set and 80.05% mIoU on PASCAL VOC 2012 test set without any post-processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call