Abstract

Although semantic segmentation of remote-sensing (RS) images using deep-learning networks has demonstrated its effectiveness recently, compared with natural-image datasets, obtaining RS images under the same conditions to construct data labels is difficult. Indeed, small datasets limit the effective learning of deep-learning networks. To address this problem, we propose a combined U-net model that is trained using a combined weighted loss function and can handle heterogeneous datasets. The network consists of encoder and decoder blocks. The convolutional layers that form the encoder blocks are shared with the heterogeneous datasets, and the decoder blocks are assigned separate training weights. Herein, the International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam and Cityscape datasets are used as the RS and natural-image datasets, respectively. When the layers are shared, only visible bands of the ISPRS Potsdam data are used. Experimental results show that when same-sized heterogeneous datasets are used, the semantic segmentation accuracy of the Potsdam data obtained using our proposed method is lower than that obtained using only the Potsdam data (four bands) with other methods, such as SegNet, DeepLab-V3+, and the simplified version of U-net. However, the segmentation accuracy of the Potsdam images is improved when the larger Cityscape dataset is used. The combined U-net model can effectively train heterogeneous datasets and overcome the insufficient training data problem in the context of RS-image datasets. Furthermore, it is expected that the proposed method can not only be applied to segmentation tasks of aerial images but also to tasks with various purposes of using big heterogeneous datasets.

Highlights

  • Semantic segmentation involves the allocation of a semantic label to each pixel of an image containing an object, which can deliver high-level structure information [1]

  • The network composed of encoder and decoder blocks, and the encoder blocks were shared with the two different datasets (Potsdam and Cityscape); the network was updated using the combined weighted loss function

  • The results obtained from the experiments indicated that when training using the identically sized Potsdam data with RBG bands and Cityscape data, the overall accuracy (OA) was decreased compared with single models training using only the original Potsdam data

Read more

Summary

Introduction

Semantic segmentation involves the allocation of a semantic label to each pixel of an image containing an object, which can deliver high-level structure information [1]. Deep-learning models can learn high-level abstract features from raw images with excellent performance; these approaches rely on large training samples [3]. To satisfy this requirement, various public datasets have been proposed for scene labeling. The Cityscape dataset has provided a semantic understanding of urban street scenes [7]. It contains 5000 images with dense pixel-level labeling of more than 30 classes of scenes that are commonly encountered during driving, such as vehicles, roads and fences

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call