Abstract
Semantic segmentation, which refers to pixel-wise classification of an image, is a fundamental topic in computer vision owing to its growing importance in the robot vision and autonomous driving sectors. It provides rich information about objects in the scene such as object boundary, category, and location. Recent methods for semantic segmentation often employ an encoder-decoder structure using deep convolutional neural networks. The encoder part extracts features of the image using several filters and pooling operations, whereas the decoder part gradually recovers the low-resolution feature maps of the encoder into a full input resolution feature map for pixel-wise prediction. However, the encoder-decoder variants for semantic segmentation suffer from severe spatial information loss, caused by pooling operations or stepwise convolutions, and does not consider the context in the scene. In this paper, we propose a novel dense upsampling convolution method based on a guided filter to effectively preserve the spatial information of the image in the network. We further propose a novel local context convolution method that not only covers larger-scale objects in the scene but covers them densely for precise object boundary delineation. Theoretical analyses and experimental results on several benchmark datasets verify the effectiveness of our method. Qualitatively, our approach delineates object boundaries at a level of accuracy that is beyond the current excellent methods. Quantitatively, we report a new record of 82.86% and 81.62% of pixel accuracy on ADE20K and Pascal-Context benchmark datasets, respectively. In comparison with the state-of-the-art methods, the proposed method offers promising improvements.
Highlights
Published: 20 March 2021Image semantic segmentation, which corresponds to pixel-wise classification of an image, is a vital topic in computer vision
Our method presents a good ability in finding missing parts of smalland large-scale objects in complex scenes, as demonstrated by the ADE20K
We have addressed the problem of spatial information loss and missing contextual details for image semantic segmentation using deep learning
Summary
Image semantic segmentation, which corresponds to pixel-wise classification of an image, is a vital topic in computer vision. It provides a comprehensive scenery description of the given image, including the information of object category, position, and shape. The breakthrough of deep learning on various high-level computer vision tasks such as image classification [2,3] and object detection [4,5] has motivated computer vision scholars to explore the capabilities of such algorithms for pixel-level labeling problems such as semantic segmentation. Adapting convolutional neural networks (CNNs) for the task of semantic segmentation allows us to obtain rich details of object categories and scene semantics in an image
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.