FURSformer: Semantic Segmentation Network for Remote Sensing Images with Fused Heterogeneous Features

Zehua Zhang,Yani Li,Bailin Liu

doi:10.3390/electronics12143113

Zehua Zhang, Yani Li + Show 1 more

Open Access

https://doi.org/10.3390/electronics12143113

Copy DOI

Abstract

Semantic segmentation of remote sensing images poses a formidable challenge within this domain. Our investigation commences with a pilot study aimed at scrutinizing the advantages and disadvantages of employing a Transformer architecture and a CNN architecture in remote sensing imagery (RSI). Our objective is to substantiate the indispensability of both local and global information for RSI analysis. In this research article, we harness the potential of the Transformer model to establish global contextual understanding while incorporating an additional convolution module for localized perception. Nonetheless, a direct fusion of these heterogeneous information sources often yields subpar outcomes. To address this limitation, we propose an innovative hierarchical fusion feature information module that this model can fuse Transformer and CNN features using an ensemble-to-set approach, thereby enhancing information compatibility. Our proposed model, named FURSformer, amalgamates the strengths of the Transformer architecture and CNN. The experimental results clearly demonstrate the effectiveness of this approach. Notably, our model achieved an outstanding accuracy of 90.78% mAccuracy on the DLRSD dataset.

Full Text