Abstract

This paper focuses on semantic segmentation of scenes by capturing the appropriate scale, rich detail and contextual dependencies in a feature representation. Semantic segmentation is a pixel-level classification task and has made steady progress on the basis of fully convolutional networks (FCNs). However, we find there is still room for improvement in the following aspects. The first is that the pixel itself has not enough information for semantic prediction, it needs to look around to determine which category it belongs to. However, the fixed size of the receptive field defined by the network is not suitable for all pixels when an image contains objects with various scales. The second is that the extracted scale-aware features do not handle sharper object boundaries due to low-resolution. The final aspect is regarding the ability to model long-range dependencies. In order to solve the above challenges, in this paper we propose three modules: Scale-aware spatial pyramid pool module, Encoder mask module and Scale-attention module (SSPP-ES). Extensive experiments on the Cityscapes and ADE20K benchmarks demonstrate the effectiveness of our approach for semantic segmentation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.