Abstract

Semantic image segmentation has recently witnessed considerable progress by training deep convolutional neural networks (CNNs). The core issue of this technique is the limited capacity of CNNs to depict visual objects. Existing approaches tend to utilize approximate inference in a discrete domain or additional aides and do not have a global optimum guarantee. We propose the use of the multi-label manifold ranking (MR) method in solving the linear objective energy function in a continuous domain to delineate visual objects and solve these problems. We present a novel embedded single stream optimization method based on the MR model to avoid approximations without sacrificing expressive power. In addition, we propose a novel network, which we refer to as dual multi-scale manifold ranking (DMSMR) network, that combines the dilated, multi-scale strategies with the single stream MR optimization method in the deep learning architecture to further improve the performance. Experiments on high resolution images, including close-range and remote sensing datasets, demonstrate that the proposed approach can achieve competitive accuracy without additional aides in an end-to-end manner.

Highlights

  • Semantic image segmentation, which aims to classify each pixel into one of the given categories, is an important task for understanding [1,2,3] and inferring objects [4,5,6] and their observed relations in a scene

  • The results demonstrate the superiority of the combination of multi-scale (MS), broader receptive field (Dilated), and manifold ranking optimization (MR-Opti) strategies, which can more accurately classify each pixel with varying spatial resolutions

  • We present a dual multi-scale manifold ranking (DMSMR) network for semantic image segmentation in a continuous domain

Read more

Summary

Introduction

Semantic image segmentation, which aims to classify each pixel into one of the given categories, is an important task for understanding [1,2,3] and inferring objects [4,5,6] and their observed relations in a scene. Over the last five years, remarkable success in the semantic scene labeling area has been gained through the usage of convolutional neural networks (CNNs) [20,21,22,23,24,25,26] in dense prediction. Coarse pixel-wise labeling is obtained by multi-scale and dilation strategies, whereas the fine segmentation is conducted by optionally integrating contextual information into the output map. Active research has been conducted on these aspects, semantic image segmentation remains a challenging issue because of the complexity. The EvLab-SS dataset poses more challenge to researchers.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call