With the recent development of remote sensing technology and deep learning, semantic segmentation methods have been increasingly used in land cover classification. However, this method is faced with the challenge of incomplete recognition caused by big differences in scale of ground objects. Owing to multi-head self-attention, the Swin Transformer Network (Swin) has a large receptive field at its shallow level, which is conducive to the identification of large-scale objects. However, Swin does not fully mine the context information of features, so it is easy to cause incomplete recognition. Based on Swin, we propose a parallel window-based Transformer Network, Parallel Swin Transformer Network (P-Swin). The core of P-Swin is a Parallel Swin Transformer Block (PST Block), which includes Window-based Self Attention Interaction (WSAI) and Feed Forward Network (FFN). WSAI can not only calculate the relationship within windows, but also establish the relationship between windows. Therefore, it improves the ability of network to obtain feature context information. P-Swin outperformed Swin and reached the highest level, with 76.42% mIoU for the test set in the ISPRS Potsdam 2D dataset (Swin: 75.95%), 65.13% mIoU for the test set in the Gaofen Image Dataset (Swin: 63.41%), and 64.61% mIoU for the test set in the WHDLD Dataset (Swin: 63.01%)
Read full abstract