DGFNet: Dual Gate Fusion Network for Land Cover Classification in Very High-Resolution Images

Yongjie Guo,Feng Wang,Hongjian You,Yuming Xiang

doi:10.3390/rs13183755

Abstract

Deep convolutional neural networks (DCNNs) have been used to achieve state-of-the-art performance on land cover classification thanks to their outstanding nonlinear feature extraction ability. DCNNs are usually designed as an encoder–decoder architecture for the land cover classification in very high-resolution (VHR) remote sensing images. The encoder captures semantic representation by stacking convolution layers and shrinking image spatial resolution, while the decoder restores the spatial information by an upsampling operation and combines it with different level features through a summation or skip connection. However, there is still a semantic gap between different-level features; a simple summation or skip connection will reduce the performance of land-cover classification. To overcome this problem, we propose a novel end-to-end network named Dual Gate Fusion Network (DGFNet) to restrain the impact of the semantic gap. In detail, the key of DGFNet consists of two main components: Feature Enhancement Module (FEM) and Dual Gate Fusion Module (DGFM). Firstly, the FEM combines local information with global contents and strengthens the feature representation in the encoder. Secondly, the DGFM is proposed to reduce the semantic gap between different level features, effectively fusing low-level spatial information and high-level semantic information in the decoder. Extensive experiments conducted on the LandCover dataset and the ISPRS Potsdam dataset proved the effectiveness of the proposed network. The DGFNet achieves state-of-art performance 88.87% MIoU on the LandCover dataset and 72.25% MIoU on the ISPRS Potsdam dataset.

Highlights

The rapid development of remote sensing sensors allows diverse access to very highresolution (VHR) remote sensing images
The Dual Gate Fusion Module (DGFM) is proposed to reduce the semantic gap between different level features, effectively fusing low-level spatial information and high-level semantic information in the decoder
At the same time, compared with other models, our network achieved the best performance in terms of mean intersection over union (MIoU)

Summary

Introduction

The rapid development of remote sensing sensors allows diverse access to very highresolution (VHR) remote sensing images. A pixel-based land cover classification, known as semantic segmentation, using very high spatial resolution images has significant application value in land resource management [1,2], urban planning [3,4], change detection [5,6], and other fields. Since optical sensors reflect the spectral characteristics of the ground target and show consistent features with the human visual system, optical remote sensing has become the mainstream method of fine land cover mapping. The clear and complex spatial structure features exceedingly increase the difficulty of land-cover classification [7]. For the pixel-based method, spectral information provided by the high-resolution images shows prodigious variance for intra-class and the similarity between different classes, leading to lower land-cover mapping accuracies [8]

Methods

Results

Conclusion