Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral image classification
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral image classification
- Conference Article
1
- 10.1145/3641584.3641609
- Sep 22, 2023
With the continuous innovation in deep learning, it has become a major direction for scholars to introduce the knowledge of deep learning into hyperspectral image classification to enhance its classification accuracy. Convolutional Neural Networks (CNN) are one of the most commonly used deep learning-based visual data processing methods, and are widely used in hyperspectral image (HSI) classification by virtue of their excellent contextual modeling capability. Since the performance of HSI classification is highly dependent on spatial and spectral information, this paper proposes a hyperspectral image classification method using 3D attention mechanism in collaboration with Transformer for hyperspectral image classification in view of the problems that the current hyperspectral image classification models with the framework of CNN have insufficient spatial spectral feature extraction and fail to excavate and represent the sequence properties of spectral features well. In this paper, we introduce a variant Transformer model based on a hybrid model of both improved 3D-CNN and 2D-CNN, combining complementary information of spatial spectrum and spectra in the form of 3D convolution and 2D convolution on CNN, and adding a variant attention mechanism module to strengthen spatial texture features, while combining grouped transfer Transformer to jump connection to enable the lower layer to better learn the upper layer features. Firstly, a variant channel attention mechanism is introduced on 3D-CNN to enhance the acquisition of spectral information of image features by 3D-CNN. Secondly, a variant spatial attention mechanism is introduced to enable 3D-CNN to better acquire the spatial information of hyperspectral images in the network, and subsequently the acquired spatial and spectral feature information is passed to 2D-CNN to enable it to better acquire local feature information. Finally, the acquired image feature information is passed to the variant Transformer model to make up for the fact that CNN can only acquire hyperspectral image features in local contexts, enabling it to better acquire global feature information on feature sequences. The experimental results show that the proposed model is experimented on two hyperspectral datasets, Indian Pines and Pavia University, and the overall classification accuracy (OA), average classification accuracy (AA), and Kappa coefficient reach up to 99.59%, 99.31%, and 99.45%, respectively, on the PU dataset, compared with the current cutting-edge techniques. The classification accuracy has been improved.
- Research Article
8
- 10.1080/01431161.2024.2370501
- Jul 5, 2024
- International Journal of Remote Sensing
Graph neural networks (GNNs) have recently garnered significant attention due to their exceptional performance across various applications, including hyperspectral (HS) image classification. However, most existing GNN-based models for HS image classification are limited depth models and often suffer from performance degradation as model depth increases. This study introduces HyperGCN, an exclusive GNN-based model designed with multiple graph convolutional layers to exploit the rich spectral information inherent in HS images, thereby enhancing classification performance. To address performance degradation, HyperGCN incorporates techniques resistant to oversmoothing into its architecture. Additionally, multiple-side exit branches are integrated into the intermediate layers of HyperGCN, enabling dynamic management of the complexity of HS images. Less complex HS images are processed by fewer layers, exiting early via attached branches, while more complex images traverse multiple layers until reaching the final output layer. Extensive experiments on four benchmark HS datasets (Indian Pines, Pavia University, Salinas, and Botswana) demonstrate HyperGCN’s superior performance over basic GNN-based models. Notably, HyperGCN outperforms or performs comparably to the CNN-GNN combined model in classifying HS images. Furthermore, the superior performance of multi-exit HyperGCN over its single-exit counterpart emphasizes the effectiveness of incorporating side exit branches in GNN-based HS image classification. Compared to state-of-the-art models, multi-exit HyperGCN demonstrates competitive performance, highlighting its effectiveness in handling complex spectral information in HS images while maintaining an acceptable balance between accuracy and computational efficiency.
- Research Article
2
- 10.3390/rs17122008
- Jun 11, 2025
- Remote Sensing
Deep learning has recently achieved remarkable progress in hyperspectral image (HSI) classification. Among these advancements, the Transformer-based models have gained considerable attention due to their ability to establish long-range dependencies. However, the quadratic computational complexity of the self-attention mechanism limits its application in hyperspectral image classification (HSIC). Recently, the Mamba architecture has shown outstanding performance in 1D sequence modeling tasks owing to its lightweight linear sequence operations and efficient parallel scanning capabilities. Nevertheless, its application in HSI classification still faces challenges. Most existing Mamba-based approaches adopt various selective scanning strategies for HSI serialization, ensuring the adjacency of scanning sequences to enhance spatial continuity. However, these methods lead to substantially increased computational overhead. To overcome these challenges, this study proposes the Hyperspectral Spatial Mamba (HyperSMamba) model for HSIC, aiming to reduce computational complexity while improving classification performance. The suggested framework consists of the following key components: (1) a Multi-Scale Spatial Mamba (MS-Mamba) encoder, which refines the state-space model (SSM) computation by incorporating a Multi-Scale State Fusion Module (MSFM) after the state transition equations of original SSMs. This module aggregates adjacent state representations to reinforce spatial dependencies among local features; (2) our proposed Adaptive Fusion Attention Module (AFAttention) to dynamically fuse bidirectional Mamba outputs for optimizing feature representation. Experiments were performed on three HSI datasets, and the findings demonstrate that HyperSMamba attains overall accuracy of 94.86%, 97.72%, and 97.38% on the Indian Pines, Pavia University, and Salinas datasets, while maintaining low computational complexity. These results confirm the model’s effectiveness and potential for practical application in HSIC tasks.
- Research Article
- 10.1080/01431161.2026.2658272
- Apr 17, 2026
- International Journal of Remote Sensing
Hyperspectral image (HSI) classification is based on the principle that the same object exhibits the same spectrum. However, due to the presence of spectral mixing, relying solely on spectral information makes it difficult to achieve accurate classification. Therefore, effectively extracting and integrating spatial-spectral information are crucial for HSI classification. Modelling long-range dependencies among spatial pixels to extract global spatial context is helpful for identifying and understanding land-cover categories and spatial structure distribution in the image. In recent years, the Mamba model has attracted much attention and has been widely applied in HSI classification due to its ability to model long-range dependencies with linear computational complexity. However, it is challenging for a single Mamba model to comprehensively understand spatial and spectral information. Therefore, we propose a novel HSI classification model named NexusMamba, which combines Mamba with the convolutional network to extract spatial and spectral information separately and adaptively integrates spatial-spectral information. Specifically, we design a global spatial Mamba module (GSMM) to model long-range dependencies at the pixel-level for the entire image. Subsequently, we propose a local spectral convolution module (LSCM) to capture local detail information in spectral bands and extract spectral features from a local perspective. Finally, we propose a spatial-spectral adaptive fusion module (SSAFM) to adaptively integrate the spatial and spectral features of HSI. To evaluate the classification performance of NexusMamba, we conducted extensive experiments on three different HSI datasets. The experimental results demonstrate its superior performance in terms of classification accuracy and efficiency. Specifically, NexusMamba achieves OA improvements of 1.96%, 1.46% and 1.78% on the PU, IP and HongHu datasets, respectively. This also reveals that Mamba is expected to become the core backbone of next-generation HSI classification models.
- Research Article
43
- 10.1016/j.knosys.2020.106319
- Jul 29, 2020
- Knowledge-Based Systems
Hyperspectral image classification based on discriminative locality preserving broad learning system
- Research Article
10
- 10.1117/1.jrs.12.035003
- Jun 15, 2018
- Journal of Applied Remote Sensing
Due to its excellent performance in terms of fast implementation, strong generalization capability and straightforward solution, extreme learning machine (ELM) has attracted increasingly attentions in pattern recognition such as face recognition and hyperspectral image (HSI) classification. However, the performance of ELM for HSI classification remains a challenging problem especially in effective extraction of the featured information from the massive volume of data. To this end, we propose in this paper a new method to combine Convolutional neural network (CNN) with ELM (CNN-ELM) for HSI classification. As CNN has been successfully applied for feature extraction in different applications, the combined CNN-ELM approach aims to take advantages of these two techniques for improved classification of HSI. By preserving the spatial features whilst reconstructing the spectral features of HSI, the proposed CNN-ELM method can significantly improve the accuracy of HSI classification without increasing the computational complexity. Comprehensive experiments using three publicly available HSI data sets, Pavia University, Pavia center, and Salinas have fully validated the improved performance of the proposed method when benchmarking with several state-of-the-art approaches.
- Conference Article
82
- 10.1109/ictai.2016.0158
- Nov 1, 2016
CNNs (convolutional neural networks) have been proved to be efficient deep learning models that can directly extract high level features from raw data. In this paper, a novel CCS (Cube-CNN-SVM) method is proposed for hyperspectral image classification, which is a spectral-spatial feature based hybrid model of CNN and SVM (support vector machine). Different from most of traditional methods that only take spectral information into consideration, a target pixel and the spectral information of its neighbors are organized into a spectral-spatial multi-feature cube used in hyperspectral image classification. It is a straightforward but valid spatial strategy that can easily improve classification accuracy without extra modification of deep CNN's structure except the size of input layer and convolutional kernel. Our deep CNN consists of the input layer, convolutional layer, max pooling layer, full connection layer and output layer. To further improve hyperspectral image classification accuracy, SVM is trained as hyperspectral image classifier with the features extracted by deep CNN from spectral-spatial fusion information. Three hyperspectral image datasets such as the KSC (Kennedy Space Center), PU (Pavia University Scene) and Indian Pines are used to evaluate the performance of CCS method. Experimental results indicate that the hyperspectral image classification can be improved efficiently with the spectral-spatial fusion strategy and CCS method. Firstly, it is easy to implement the spatial strategy to improve classification accuracy about 4% compared with only spectral information used for classification, in which 98.49% is gained on the KSC dataset. Secondly, CCS method can further improve classification accuracy about 1%~3% compared to the best performance of deep CNN, in which 99.45% is gained on the PU dataset.
- Research Article
10
- 10.3390/rs13183592
- Sep 9, 2021
- Remote Sensing
Hyperspectral image (HSI) classification is one of the major problems in the field of remote sensing. Particularly, graph-based HSI classification is a promising topic and has received increasing attention in recent years. However, graphs with pixels as nodes generate large size graphs, thus increasing the computational burden. Moreover, satisfactory classification results are often not obtained without considering spatial information in constructing graph. To address these issues, this study proposes an efficient and effective semi-supervised spectral-spatial HSI classification method based on sparse superpixel graph (SSG). In the constructed sparse superpixels graph, each vertex represents a superpixel instead of a pixel, which greatly reduces the size of graph. Meanwhile, both spectral information and spatial structure are considered by using superpixel, local spatial connection and global spectral connection. To verify the effectiveness of the proposed method, three real hyperspectral images, Indian Pines, Pavia University and Salinas, are chosen to test the performance of our proposal. Experimental results show that the proposed method has good classification completion on the three benchmarks. Compared with several competitive superpixel-based HSI classification approaches, the method has the advantages of high classification accuracy (>97.85%) and rapid implementation (<10 s). This clearly favors the application of the proposed method in practice.
- Research Article
- 10.5194/ica-abs-1-233-2019
- Jul 15, 2019
- Abstracts of the ICA
Abstract. Hyperspectral images (HSIs) contain hundreds of spectral bands, providing high-resolution spectral information pertaining to the Earth’s surface. Additionally, abundant spatial contextual information can also be obtained simultaneously from a HSI. To characterize the properties of ground objects, classification is the most widely-used technology in the field of remote sensing, where each pixel in a HSI is assigned to a pre-defined class. Over the past decade, deep learning has attracted increasing attention in the machine-learning and computer-vision domains, due to its favourable performances for various types of tasks, and it has been successfully introduced to the remote-sensing community. Instead of utilizing the shallow features within in a given image, which is the approach that is generally adopted in other conventional classification methods, deep-learning algorithms can extract hierarchical features from raw HSI data. Within the deep-learning framework, recurrent neural networks (RNNs), which are able to encode sequential features, have exhibited promising capabilities and have achieved encouraging performances, especially for the natural-language processing and speech-recognition communities. As multi-temporal remote-sensing images can be readily obtained from increasing numbers of satellite and unmanned aircraft systems, and since analysis of such multi-temporal data comprises a critical issue within numerous research subfields, including land-cover and land-change analyses, and land-resource management, RNNs have been applied in recent studies in order to extract temporal sequential features from multi-temporal remote-sensing images for the purpose of image classification. Apart from using multi-temporal image datasets, RNNs can also be utilized on a single image, where the spectral feature/band of each individual pixel can be taken as a sequential feature for the input layer of RNNs. However, the application of such sequential feature extraction that relies on a single image still needs to be further investigated since applying RNNs to spectral bands will directly introduce more parameters that need to be optimized, consequently increasing the total training time.In this study, we propose a novel RNN-based HSI classification framework. In this framework, unlabelled pixels obtained from a single image are considered when constructing sequential features. Two spatial similarity measurements, referred to as pixel-matching and block-matching, respectively, are employed to extract pixels that are “similar” to the target pixel. Then, the sequential feature of the target pixel is constructed by exploiting several of the most “similar” pixels and ordering them based on their similarities to the target pixel. The aforementioned two schemes are advantageous, as unlabelled pixels within the given HSI are taken into consideration for similarity measurement and sequential feature construction for the RNN model. Moreover, the block-matching scheme also takes advantage of spatial contextual information, which has been widely utilized in spatial-spectral-based HSI classification methods. To evaluate the proposed methods, two benchmark HSIs are used, including a HSI collected over Pavia University, Italy by the airborne Reflective Optics System Imaging Spectrometer (ROSIS) sensor, and an image acquired over the Salinas Valley, California, USA via the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor. Spatio-temporally coincident ground-reference data accompanies each of these respective HSIs. In addition, the proposed methods are compared with three state-of-the-art algorithms, including support vector machine (SVM), the 1-dimensional convolutional neural network (1DCNN), and the 1-dimensional RNN (1DRNN).Experimental results indicate that our proposed methods achieve markedly better classification performance compared with the baseline algorithms on both datasets. For example, for the Pavia University image, the block-matching based RNN achieves the highest overall classification accuracy, with 94.32% accuracy, which is 9.87% higher than the next most accurate algorithm of the aforementioned three baseline methods, which in this case is the 1DCNN, with 84.45% overall accuracy. More specifically, the block-matching method performs better than the pixel-matching method in terms of both quantitative and qualitative assessments. Based on visual assessment/interpretation of the classification maps, it is apparent that “salt-and-pepper” noise is markedly alleviated; with block-matching, smoother classified images are generated compared with pixel-matching-based methods and the three baseline algorithms. Such results demonstrate the effectiveness of utilizing spatial contextual information in the similarity measurement.
- Research Article
223
- 10.1109/tgrs.2019.2902568
- Apr 18, 2019
- IEEE Transactions on Geoscience and Remote Sensing
Recently, hyperspectral image (HSI) classification approaches based on deep learning (DL) models have been proposed and shown promising performance. However, because of very limited available training samples and massive model parameters, DL methods may suffer from overfitting. In this paper, we propose an end-to-end 3-D lightweight convolutional neural network (CNN) (abbreviated as 3-D-LWNet) for limited samples-based HSI classification. Compared with conventional 3-D-CNN models, the proposed 3-D-LWNet has a deeper network structure, less parameters, and lower computation cost, resulting in better classification performance. To further alleviate the small sample problem, we also propose two transfer learning strategies: 1) cross-sensor strategy, in which we pretrain a 3-D model in the source HSI data sets containing a greater number of labeled samples and then transfer it to the target HSI data sets and 2) cross-modal strategy, in which we pretrain a 3-D model in the 2-D RGB image data sets containing a large number of samples and then transfer it to the target HSI data sets. In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets. Experiments on three public HSI data sets captured by different sensors demonstrate that our model achieves competitive performance for HSI classification compared to several state-of-the-art methods
- Research Article
32
- 10.32604/cmes.2022.020601
- Jan 1, 2022
- Computer Modeling in Engineering & Sciences
Hyperspectral image (HSI) classification has been one of the most important tasks in the remote sensing community over the last few decades. Due to the presence of highly correlated bands and limited training samples in HSI, discriminative feature extraction was challenging for traditional machine learning methods. Recently, deep learning based methods have been recognized as powerful feature extraction tool and have drawn a significant amount of attention in HSI classification. Among various deep learning models, convolutional neural networks (CNNs) have shown huge success and offered great potential to yield high performance in HSI classification. Motivated by this successful performance, this paper presents a systematic review of different CNN architectures for HSI classification and provides some future guidelines. To accomplish this, our study has taken a few important steps. First, we have focused on different CNN architectures, which are able to extract spectral, spatial, and joint spectral-spatial features. Then, many publications related to CNN based HSI classifications have been reviewed systematically. Further, a detailed comparative performance analysis has been presented between four CNN models namely 1D CNN, 2D CNN, 3D CNN, and feature fusion based CNN (FFCNN). Four benchmark HSI datasets have been used in our experiment for evaluating the performance. Finally, we concluded the paper with challenges on CNN based HSI classification and future guidelines that may help the researchers to work on HSI classification using CNN.
- Research Article
17
- 10.3390/rs15071803
- Mar 28, 2023
- Remote Sensing
Hyperspectral images (HSI) contain powerful spectral characterization capabilities and are widely used especially for classification applications. However, the rich spectrum contained in HSI also increases the difficulty of extracting useful information, which makes the feature extraction method significant as it enables effective expression and utilization of the spectrum. Traditional HSI feature extraction methods design spectral features manually, which is likely to be limited by the complex spectral information within HSI. Recently, data-driven methods, especially the use of convolutional neural networks (CNNs), have shown great improvements in performance when processing image data owing to their powerful automatic feature learning and extraction abilities and are also widely used for HSI feature extraction and classification. The CNN extracts features based on the convolution operation. Nevertheless, the local perception of the convolution operation makes CNN focus on the local spectral features (LSF) and weakens the description of features between long-distance spectral ranges, which will be referred to as global spectral features (GSF) in this study. LSF and GSF describe the spectral features from two different perspectives and are both essential for determining the spectrum. Thus, in this study, a local-global spectral feature (LGSF) extraction and optimization method is proposed to jointly consider the LSF and GSF for HSI classification. To increase the relationship between spectra and the possibility to obtain features with more forms, we first transformed the 1D spectral vector into a 2D spectral image. Based on the spectral image, the local spectral feature extraction module (LSFEM) and the global spectral feature extraction module (GSFEM) are proposed to automatically extract the LGSF. The loss function for spectral feature optimization is proposed to optimize the LGSF and obtain improved class separability inspired by contrastive learning. We further enhanced the LGSF by introducing spatial relation and designed a CNN constructed using dilated convolution for classification. The proposed method was evaluated on four widely used HSI datasets, and the results highlighted its comprehensive utilization of spectral information as well as its effectiveness in HSI classification.
- Research Article
36
- 10.3390/rs15235483
- Nov 24, 2023
- Remote Sensing
Graph convolutional networks (GCNs) are a promising approach for addressing the necessity for long-range information in hyperspectral image (HSI) classification. Researchers have attempted to develop classification methods that combine strong generalizations with effective classification. However, the current HSI classification methods based on GCN present two main challenges. First, they overlook the multi-view features inherent in HSIs, whereas multi-view information interacts with each other to facilitate classification tasks. Second, many algorithms perform a rudimentary fusion of extracted features, which can result in information redundancy and conflicts. To address these challenges and exploit the strengths of multiple features, this paper introduces an adaptive multi-feature fusion GCN (AMF-GCN) for HSI classification. Initially, the AMF-GCN algorithm extracts spectral and textural features from the HSIs and combines them to create fusion features. Subsequently, these three features are employed to construct separate images, which are then processed individually using multi-branch GCNs. The AMG-GCN aggregates node information and utilizes an attention-based feature fusion method to selectively incorporate valuable features. We evaluated the model on three widely used HSI datasets, i.e., Pavia University, Salinas, and Houston-2013, and achieved accuracies of 97.45%, 98.03%, and 93.02%, respectively. Extensive experimental results show that the classification performance of the AMF-GCN on benchmark HSI datasets is comparable to those of state-of-the-art methods.
- Research Article
10
- 10.1109/lgrs.2022.3192832
- Jan 1, 2022
- IEEE Geoscience and Remote Sensing Letters
Recently, graph convolutional network (GCN) has been applied for hyperspectral image (HSI) classification and obtained better performance. The main issue in HSI classification is that the high-resolution HSI contains more complex spectral-spatial structure information. However, the previous GCN-based methods applied in HSI classification only adopted a shallow GCN layer and they can not extract the deeper discriminative features. In addition, these methods ignored the complementary and correlated information among multi-order neighboring information extracted by multiple GCN layers. In this letter, a novel feature fusion via deep residual graph convolutional network is proposed to explore the internal relationship among HSI data. On the one hand, benefiting from residual learning to alleviate the over-smoothing problem, we can construct deep GCN layers to excavate deeper abstract features of HSI. On the other hand, we fuse the outputs of different GCN layers, and thus, the local structural information within multi-order neighborhood nodes can be fully utilized. Extensive experiments on four real HSI data sets, including Indian Pines, Pavia University, Salinas, and Houston University, demonstrate the superiority of the proposed method compared with other state-of-the-art methods in various evaluation criteria.
- Research Article
19
- 10.3390/rs15051206
- Feb 22, 2023
- Remote Sensing
Hyperspectral image (HSI) classification is a significant foundation for remote sensing image analysis, widely used in biology, aerospace, and other applications. Convolution neural networks (CNNs) and attention mechanisms have shown outstanding ability in HSI classification and have been widely studied in recent years. However, the existing CNN-based and attention mechanism-based methods cannot fully use spatial–spectral information, which is not conducive to further improving HSI classification accuracy. This paper proposes a new spatial–spectral Transformer network with multi-scale convolution (SS-TMNet), which can effectively extract local and global spatial–spectral information. SS-TMNet includes two key modules, i.e., multi-scale 3D convolution projection module (MSCP) and spatial–spectral attention module (SSAM). The MSCP uses multi-scale 3D convolutions with different depths to extract the fused spatial–spectral features. The spatial–spectral attention module includes three branches: height spatial attention, width spatial attention, and spectral attention, which can extract the fusion information of spatial and spectral features. The proposed SS-TMNet was tested on three widely used HSI datasets: Pavia University, IndianPines, and Houston2013. The experimental results show that the proposed SS-TMNet is superior to the existing methods.