Hyperspectral image classification is the process of identifying ground objects within hyperspectral images at the pixel level. While many CNN based methods have been successfully used in this field, they often struggle to effectively extract inter-spectral information due to the high spectral dimensions of hyperspectral images. Recently, the transformer model has been introduced to capture the interdependence between spectral data, but it may lose the ability to capture local context features. In order to address the limitations of CNN and Transformer models, we propose a novel approach that integrates the advantages of both models. Specifically, we use CNN to extract spatial information and Transformer to extract spectral information, which is then fused before being fed into the MLP framework for classification. Additionally, we introduce a sparse strategy to eliminate the impact of redundant frequency bands on the transformer’s performance. Our method fully utilizes the spatial and spectral information in hyperspectral image data and has achieved excellent performance on hyperspectral datasets. By leveraging both CNN and transformer models, we can effectively capture both spatial and spectral features, providing a powerful tool for hyperspectral image classification.
Read full abstract