ABSTRACT The CNN-Transformer joint model stands as the leading architecture for contemporary hyperspectral image (HSI) classification, integrating global and local features through either successive or dual-branch CNN and Transformer networks. However, these methods often fall short in effectively incorporating spatial-spectral information with local-global attributes, resulting in incomplete feature representation. To address these challenges, we propose a spectral-spatial global-local interaction network that transmits global and local features into the spatial and spectral branches, facilitated by cross-interaction operators to ensure adequate feature flow. Initially, CNNs are employed to separately extract shallow features for the spectral and spatial branches. We then introduce a Spectral-Spatial Global-Local Interaction block designed for deep feature extraction, enhancing the flow of spectral and spatial features with local and global attributes using parallel transformers and dynamic convolutions. Transformers model the long-range dependencies of global spectral and spatial features, while dynamic convolutions enhance the context sensitivity of local spectral and spatial representations. Quadruple cross-interaction blocks are proposed to traverse both the spectral-spatial branches and local-global attribute dimensions, facilitating information exchange for complementary HSI representation. Extensive experiments and ablation studies on four public HSI datasets demonstrate the superiority of our proposed method. convolutional neural networks (CNNs); spectral and spatial; global and local; hyperspectral image classification; cross-interaction.
Read full abstract