DWin-HRFormer: A High-Resolution Transformer Model With Directional Windows for Semantic Segmentation of Urban Construction Land

Zhen Zhang,Jiayi Li,Xin Huang

doi:10.1109/tgrs.2023.3241366

Zhen Zhang, Jiayi Li + Show 1 more

https://doi.org/10.1109/tgrs.2023.3241366

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this paper, a deep neural network for semantic segmentation of high-resolution remote sensing images is proposed for urban construction land classification. The network follows a high-resolution network architecture. Specifically, a directional self-attention on the paths of different resolutions is proposed, aiming to correct the directional bias caused by the attention of strip windows during the model learning, while also reducing the computational complexity, and allowing the model to improve both the accuracy and speed. At the end of the network, a distributed alignment module with spatial information is constructed to train additional learnable parameters, to adjust the biased decision boundaries through a two-stage learning strategy, and alleviate the problem of accuracy degradation due to the unbalanced training data. We tested the proposed method and compared it with the current state-of-the-art semantic segmentation methods on the Luojia-FGLC and WHDLD datasets, and the proposed one obtained the best performance. We also verified the effectiveness of each component of the network through ablation experiments. The code and model will be available at https://github.com/Zhzhyd/DWin-HRFormer.

Full Text