Abstract

Semantic segmentation in high-resolution remote-sensing (RS) images is a fundamental task for RS-based urban understanding and planning. However, various types of artificial objects in urban areas make this task quite challenging. Recently, the use of Deep Convolutional Neural Networks (DCNNs) with multiscale information fusion has demonstrated great potential in enhancing performance. Technically, however, existing fusions are usually implemented by summing or concatenating feature maps in a straightforward way. Seldom do works consider the spatial importance for global-to-local context-information aggregation. This paper proposes a Learnable-Gated CNN (L-GCNN) to address this issue. Methodologically, the Taylor expression of the information-entropy function is first parameterized to design the gate function, which is employed to generate pixelwise weights for coarse-to-fine refinement in the L-GCNN. Accordingly, a Parameterized Gate Module (PGM) was designed to achieve this goal. Then, the single PGM and its densely connected extension were embedded into different levels of the encoder in the L-GCNN to help identify the discriminative feature maps at different scales. With the above designs, the L-GCNN is finally organized as a self-cascaded end-to-end architecture that is able to sequentially aggregate context information for fine segmentation. The proposed model was evaluated on two public challenging benchmarks, the ISPRS 2Dsemantic segmentation challenge Potsdam dataset and the Massachusetts building dataset. The experiment results demonstrate that the proposed method exhibited significant improvement compared with several related segmentation networks, including the FCN, SegNet, RefineNet, PSPNet, DeepLab and GSN.For example, on the Potsdam dataset, our method achieved a 93.65% F 1 score and 88.06% I o U score for the segmentation of tiny cars in high-resolution RS images. As a conclusion, the proposed model showed potential for object segmentation from the RS images of buildings, impervious surfaces, low vegetation, trees and cars in urban settings, which largely varies in size and have confusing appearances.

Highlights

  • With the rapid development of global observation technologies, a large number of remote-sensing (RS) images with high spatial resolution can be acquired every day

  • The quantitative scores of F1 and Intersection over Union (IoU) obtained by the seven models on all test images in the Potsdam dataset are listed in Tables 2 and 3, respectively

  • It is superior to the Pyramid Scene Parsing Network (PSPNet), RefineNet and DeepLab models, which were all designed with the idea of multiscale information fusion in large receptive fields

Read more

Summary

Introduction

With the rapid development of global observation technologies, a large number of remote-sensing (RS) images with high spatial resolution can be acquired every day. The main challenge lies in that, in the absence of prior knowledge about the image itself and the motivation behind segmentation for applications, there is no general way to instruct a computer on how to group visual patterns of colors, textures and other features To this end, this paper mainly focuses on the fundamental task of semantic segmentation in high-resolution RS images obtained by airborne sensors

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call