Semantic segmentation plays a pivotal role in the intelligent interpretation of remote sensing images (RSIs). However, conventional methods predominantly focus on learning representations within the spatial domain, often resulting in suboptimal discriminative capabilities. Given the intrinsic spectral characteristics of RSIs, it becomes imperative to enhance the discriminative potential of these representations by integrating spectral context alongside spatial information. In this paper, we introduce the spectrum-space collaborative network (SSCNet), which is designed to capture both spectral and spatial dependencies, thereby elevating the quality of semantic segmentation in RSIs. Our innovative approach features a joint spectral–spatial attention module (JSSA) that concurrently employs spectral attention (SpeA) and spatial attention (SpaA). Instead of feature-level aggregation, we propose the fusion of attention maps to gather spectral and spatial contexts from their respective branches. Within SpeA, we calculate the position-wise spectral similarity using the complex spectral Euclidean distance (CSED) of the real and imaginary components of projected feature maps in the frequency domain. To comprehensively calculate both spectral and spatial losses, we introduce edge loss, Dice loss, and cross-entropy loss, subsequently merging them with appropriate weighting. Extensive experiments on the ISPRS Potsdam and LoveDA datasets underscore SSCNet’s superior performance compared with several state-of-the-art methods. Furthermore, an ablation study confirms the efficacy of SpeA.
Read full abstract