Abstract High-resolution remote sensing (HRRS) image scene classification has gained increasing importance in recent years, with convolutional neural networks (CNNs) showing particular promise due to their proficiency in extracting spatial features. However, traditional CNNs face significant limitations. Specifically, they struggle to capture complex semantic relationships between objects at varying scales, and they lack the ability to effectively capture long-distance dependencies between features. This limitation is especially problematic in HRRS images, where spatial relationships and semantic content are deeply intertwined. Additionally, traditional CNNs are limited in handling substantial intra-class variation and inter-class similarity, which are common in remote sensing images. To overcome these challenges, we introduce a novel Residual Channel-attention (RCA) network for scene classification. The RCA network introduces a lightweight residual structure to better capture multi-scale spatial features and incorporates a channel attention mechanism that selectively emphasizes relevant feature channels while suppressing irrelevant ones. To further refine the focus on critical image features, we integrate a squeeze-and-excitation (SE) mechanism as a self-attention component, which helps the network prioritize the most informative features and ignore background noise. We evaluated the RCA network on three public datasets: RSSCN7, PatternNet, and EuroSAT, achieving classification accuracies of 97%, 99%, and 96%, respectively. The results demonstrate that superior of the RCA network compared to state-of-the-art strategies in remote sensing image classification. Furthermore, visualization using the Grad-CAM++ algorithm highlights the effectiveness of our channel attention mechanism and underscores the RCA network’s robust feature representation capabilities.
Read full abstract