Aligning semantic distribution in fusing optical and SAR images for land use classification

Wangbin Li,Kaimin Sun,Wenzhuo Li,Jinjiang Wei,Shunxia Miao,Song Gao,Qinhui Zhou

doi:10.1016/j.isprsjprs.2023.04.008

Abstract

Optical and synthetic aperture radar (SAR) images, two standard Earth observation tools, can reflect the characteristics of the surface from different perspectives and provide complementary information for land use classification. However, because they belong to different modes and express land objects differently, it is challenging to effectively fuse and use them to perform a pixel-wise classification. Current methods only focus on the local receptive field to fuse deep features in a single dimension, which is too simple to fully exploit the correlation between different modes. Moreover, the appearance disparities between each modality may induce semantic misalignment and disrupt the conditions of features’ fusion. To overcome the above problems, we introduce a spatial-aware circular module to generate a cross-modality receptive field and globally enhance the interaction between each pixel in the spatial dimension. Additionally, we recalibrate the features in the channel dimension to selectively refine and retain the essential things during the process, which can further achieve feature refinement. To reduce the impact of modal appearance disparities, we transform their high-level features into a common latent space and align their distributions to correlate the complementary cues hidden in each modality. The experimental results for the WHU-OPT-SAR dataset show that our method performed better than other state-of-the-art methods, with a mean intersection over union (mIoU) of 58.5% and an overall accuracy (OA) of 84.2%. Furthermore, the method obtained competitive results in Ezhou and Panjin, China. The results demonstrate our method’s applicability.

Full Text