Abstract

Semantic segmentation has been widely researched for high level analysis of High Spatial Resolution (HSR) remote sensing images, where Convolutional Neural Network (CNN) is the mainstream method. However, the transformer with attention mechanism has its unique capacity of extracting global information which is generally ignored by CNN models. In this paper, a Swin Transformer with UPer head (STUP) is proposed to tackle with semantic segmentation problem on a challenging remote sensing land-cover dataset called LoveDA, which owns complex background samples and inconsistent classes distributions. The proposed STUP combines the Swin Transformer with Uper Head in the form of an encoder-decoder structure, to extract features of HSR images for segmentation. Furthermore, Focal Loss is adopted to handle the unbalanced distribution problem in the training step. Experimental results demonstrate that the proposed STUP clearly outperforms several state-of-the-art models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call