Hyperspectral image (HSI) classification is an active research problem in computer vision and multimedia field. Contrary to traditional image data, HSIs contain rich spectral, spatial and semantic information. Thus, how to extract the discriminative features for HSIs by integrating spectral, spatial and semantic cues together is the core issue to address HSI classification task. Existing works mainly focus on exploring spectral and spatial information which usually fail to fully explore the rich semantic information in HSIs. To address this issue, in this paper, we first propose a novel semantic Transformer scheme, named SemanticFormer, which aims to learn discriminative visual representations for semantics by exploiting the interaction among different semantic tokens. Using the proposed SemanticFormer, we then propose a novel heterogeneous network that contains both spectral–spatial convolution network branch and SemanticFormer branch to extract spectral–spatial and semantic features simultaneously for HSIs. Experiments on two widely used datasets demonstrate the effectiveness of our SemanticFormer and HSI classification network method. Our codes will be available in https://github.com/SissiW/SemanticFormer.
Read full abstract