Abstract

ABSTRACT Semantic segmentation of high-resolution remote sensing images is important in land cover classification, road extraction, building extraction, water extraction, etc. However, high-resolution remote-sensing images have a lot of details. Due to the fixed receptive field of convolution blocks, it is impossible to model the correlation of global features. In addition, complex fusion methods cannot integrate spatial and global context information. In order to solve these problems, this paper proposes a cross-linear attention network (CLANet) to capture spatial and context information in images. The structure consists of a spatial branch and a context branch. The spatial branch is constructed by stacked convolution to better capture spatial information. The context branch models the global information based on the transformer deformation module. In addition, to effectively fuse spatial and context information, this paper also designs a feature fusion module (FFM), which uses a cross-linear attention mechanism for feature aggregation. Finally, this paper conducts many experiments on the ISPRS Vaihingen and the ISPRS Potsdam datasets. Among them, 82.28% of mIoU achieves on the ISPRS Vaihingen dataset. The experimental results show that CLANet has better performance and effect than the methods in recent years.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call