Abstract

Although the deep semantic segmentation network (DSSN) has been widely used in remote sensing (RS) image semantic segmentation, it still does not fully mind the spatial relationship cues between objects when extracting deep visual features through convolutional filters and pooling layers. In fact, the spatial distribution between objects from different classes has a strong correlation characteristic. For example, buildings tend to be close to roads. In view of the strong appearance extraction ability of DSSN and the powerful topological relationship modeling capability of the graph convolutional neural network (GCN), a DSSN-GCN framework, which combines the advantages of DSSN and GCN, is proposed in this paper for RS image semantic segmentation. To lift the appearance extraction ability, this paper proposes a new DSSN called the attention residual U-shaped network (AttResUNet), which leverages residual blocks to encode feature maps and the attention module to refine the features. As far as GCN, the graph is built, where graph nodes are denoted by the superpixels and the graph weight is calculated by considering the spectral information and spatial information of the nodes. The AttResUNet is trained to extract the high-level features to initialize the graph nodes. Then the GCN combines features and spatial relationships between nodes to conduct classification. It is worth noting that the usage of spatial relationship knowledge boosts the performance and robustness of the classification module. In addition, benefiting from modeling GCN on the superpixel level, the boundaries of objects are restored to a certain extent and there are less pixel-level noises in the final classification result. Extensive experiments on two publicly open datasets show that DSSN-GCN model outperforms the competitive baseline (i.e., the DSSN model) and the DSSN-GCN when adopting AttResUNet achieves the best performance, which demonstrates the advance of our method.

Highlights

  • The overall accuracy (OA)/ frequency weighted intersection over union (FWIoU) increased by 1.99%/2.94% for deep semantic segmentation network (DSSN)-graph convolutional neural network (GCN) V1 compared to the U-Net, the OA/FWIoU rose by 0.89%/1.23% for DSSN-GCN V2 compared to the SegNet, the OA/FWIoU increased by 0.68%/0.84% for DSSN-GCN V3 compared to the DeepLab V3+ and DSSN-GCN V4 improved the OA/FWIoU by 0.69%/1%

  • We could see the results of U-Net, DSSN-GCN V1, SegNet, DSSN-GCN V2, DeepLab V3+, DSSN-GCN V3, the proposed AttResUNet and AttResUNet-GCN (DSSN-GCN V4) from (c) to (j). It presents that segmentation of our DSSN-GCN model was more accurate and consistent compared to its backbone network, shown in (d) to (c), (f) to (e), (h) to (g) and (j) to (i), which explained the effectiveness of the proposed DSSN-GCN model to improve the results of semantic segmentation

  • The graph weight denoting the strength of spatial relationship was calculated by considering the spectral information and spatial information of the nodes

Read more

Summary

Introduction

As the fundamental task of geographic information interpretation, remote sensing (RS) image semantic segmentation is the basis for other RS research and applications, such as natural resource protection, land cover mapping and land use change detection [1,2]. it has received considerable attention in the past decade, semantic segmentation of high-resolution RS image is still full of challenges [3,4,5,6], because of the complexity of structure in RS images, which leads to interclass similarity and intraclass variability [7,8,9].With recent developments in deep learning [10,11,12,13,14], deep semantic segmentation network (DSSN) has made remarkable improvements for RS image semantic segmentation [15]compared to traditional methods, such as random forest (RF), decision trees (DT) and support vector machines (SVMs) [16]. As the fundamental task of geographic information interpretation, remote sensing (RS) image semantic segmentation is the basis for other RS research and applications, such as natural resource protection, land cover mapping and land use change detection [1,2]. It has received considerable attention in the past decade, semantic segmentation of high-resolution RS image is still full of challenges [3,4,5,6], because of the complexity of structure in RS images, which leads to interclass similarity and intraclass variability [7,8,9]. SegNet [20] recorded the index of max pooling in the encoder to perform nonlinear upsampling in the decoder

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call