Hyperspectral Image Classification Using Spectral–Spatial Token Enhanced Transformer With Hash-Based Positional Embedding
Hyperspectral image (HSI) classification aims to distinguish the category of a land coverage object for each pixel. In an effective way, the transformer architecture has been successfully introduced for the HSI classification task with promising performance. However, existing transformer-based HSI classification methods still suffer from the inability to fully explore both spectral information and spatial information in HSIs. To this end, we propose a Spectral-Spatial Token Enhanced Transformer (SSTE-Former) method with the hash-based positional embedding, which is the first to exploit multiscale spectral-spatial information for transformer-based HSI classification in-depth. Specifically, SSTE-Former accepts multiscale HSI cubes centered on the target pixel, that are preprocessed by PCA. Then, a designed multiscale CNN architecture is utilized to extract short-range spectral-spatial features and generate token embeddings. In parallel, a novel hash-based spatially enhanced positional embedding tailored for HSI cubes is developed to model the correlations within and across multiscale token embeddings. Finally, multiscale token embeddings and hash-based positional embeddings are concatenated and flattened into the transformer encoder for long-range spectral-spatial feature fusion. We conduct extensive experiments on four benchmark HSI datasets and achieve superior performance compared with the state-of-the-art HSI classification methods.
- Conference Article
1
- 10.1145/3641584.3641609
- Sep 22, 2023
With the continuous innovation in deep learning, it has become a major direction for scholars to introduce the knowledge of deep learning into hyperspectral image classification to enhance its classification accuracy. Convolutional Neural Networks (CNN) are one of the most commonly used deep learning-based visual data processing methods, and are widely used in hyperspectral image (HSI) classification by virtue of their excellent contextual modeling capability. Since the performance of HSI classification is highly dependent on spatial and spectral information, this paper proposes a hyperspectral image classification method using 3D attention mechanism in collaboration with Transformer for hyperspectral image classification in view of the problems that the current hyperspectral image classification models with the framework of CNN have insufficient spatial spectral feature extraction and fail to excavate and represent the sequence properties of spectral features well. In this paper, we introduce a variant Transformer model based on a hybrid model of both improved 3D-CNN and 2D-CNN, combining complementary information of spatial spectrum and spectra in the form of 3D convolution and 2D convolution on CNN, and adding a variant attention mechanism module to strengthen spatial texture features, while combining grouped transfer Transformer to jump connection to enable the lower layer to better learn the upper layer features. Firstly, a variant channel attention mechanism is introduced on 3D-CNN to enhance the acquisition of spectral information of image features by 3D-CNN. Secondly, a variant spatial attention mechanism is introduced to enable 3D-CNN to better acquire the spatial information of hyperspectral images in the network, and subsequently the acquired spatial and spectral feature information is passed to 2D-CNN to enable it to better acquire local feature information. Finally, the acquired image feature information is passed to the variant Transformer model to make up for the fact that CNN can only acquire hyperspectral image features in local contexts, enabling it to better acquire global feature information on feature sequences. The experimental results show that the proposed model is experimented on two hyperspectral datasets, Indian Pines and Pavia University, and the overall classification accuracy (OA), average classification accuracy (AA), and Kappa coefficient reach up to 99.59%, 99.31%, and 99.45%, respectively, on the PU dataset, compared with the current cutting-edge techniques. The classification accuracy has been improved.
- Research Article
1
- 10.3390/rs16111888
- May 24, 2024
- Remote Sensing
In recent years, the use of deep neural network in effective network feature extraction and the design of efficient and high-precision hyperspectral image classification algorithms has gradually become a research hotspot for scholars. However, due to the difficulty of obtaining hyperspectral images and the high cost of annotation, the training samples are very limited. In order to cope with the small sample problem, researchers often deepen the network model and use the attention mechanism to extract features; however, as the network model continues to deepen, the gradient disappears, the feature extraction ability is insufficient, and the computational cost is high. Therefore, how to make full use of the spectral and spatial information in limited samples has gradually become a difficult problem. In order to cope with such problems, this paper proposes two-branch multiscale spatial–spectral feature aggregation with a self-attention mechanism for a hyperspectral image classification model (FHDANet); the model constructs a dense two-branch pyramid structure, which can achieve the high efficiency extraction of joint spatial–spectral feature information and spectral feature information, reduce feature loss to a large extent, and strengthen the model’s ability to extract contextual information. A channel–space attention module, ECBAM, is proposed, which greatly improves the extraction ability of the model for salient features, and a spatial information extraction module based on the deep feature fusion strategy HLDFF is proposed, which fully strengthens feature reusability and mitigates the feature loss problem brought about by the deepening of the model. Compared with five hyperspectral image classification algorithms, SVM, SSRN, A2S2K-ResNet, HyBridSN, SSDGL, RSSGL and LANet, this method significantly improves the classification performance on four representative datasets. Experiments have demonstrated that FHDANet can better extract and utilise the spatial and spectral information in hyperspectral images with excellent classification performance under small sample conditions.
- Research Article
33
- 10.1109/lgrs.2018.2800080
- Apr 1, 2018
- IEEE Geoscience and Remote Sensing Letters
Recently, collaborative representation has received much attention in the hyperspectral image (HSI) classification due to its simplicity and effectiveness. However, the existing collaborative representation-based HSI classification methods ignore the correlation among different classes. To overcome this problem, we propose a discriminative kernel collaborative representation and Tikhonov regularization method (DKCRT) for HSI classification, which can make the kernel collaborative representation of different classes to be more discriminative. Specifically, the kernel trick is adopted to map the original HSI into a high space to improve the class separability. Besides, distance-weighted kernel Tikhonov regularization is adopted to enforce these training samples to have large representation coefficients, which are similar to the test sample in the high-dimensional feature space. Moreover, we add a discriminative regularization term to further enhance the separability of different classes, which can take the correlation among different classes into consideration. Furthermore, to take the spatial information of HSI into consideration, we extend the DKCRT to a joint version named JDKCRT. Experiments on real HSIs demonstrate the efficiency of the proposed DKCRT and JDKCRT.
- Conference Article
1
- 10.1109/icicsp55539.2022.10050698
- Nov 26, 2022
Hyperspectral image (HSI) classification is the key technology of remote sensing image processing. In recent years, convolutional neural network (CNN), which is a powerful feature extractor, has been introduced into the field of HSI classification. Since the features of HSI are the basis of HSI classification, how to effectively extract the spectral-spatial features from HSI with CNN has become a research hotspot. The HSI feature extraction network, based on two-dimensional (2D) and three-dimensional (3D) CNN which can extract both spectral and spatial information, may lead to the increase of parameters and computational cost. Compared with 2D CNN and 3D CNN, the number of parameters and computational cost of one-dimensional (1D) CNN will be greatly reduced. However, 1D CNN based algorithms can only extract the spectral information without considering the spatial information. Therefore, in this paper, a lightweight multilevel feature fusion network (LMFFN) is proposed for HSI classification, which aims to achieve efficient extraction of spectral-spatial features and to minimize the number of parameters. The main contributions of this paper are divided into the following two points: First, we design a hybrid spectral-spatial feature extraction network (HSSFEN) to combine the advantages of 1D, 2D and 3D CNN. It introduces the idea of depthwise separable convolution method, which effectively reduces the complexity of the proposed HSSFEN. Then, a multilevel spectral-spatial feature fusion network (MSSFFN) is proposed to further obtain more effective spectral-spatial features, which effectively fuses the bottom spectral-spatial features and the top spectral-spatial features. To demonstrate the performance of our proposed method, a series of experiments are conducted on three HSI datasets, including Indian Pine, University of Pavia, and Salinas Scene datasets. The experimental results indicate that our proposed LMFFN is able to achieve better performance than the manual feature extraction methods and deep learning methods, which demonstrates the superiority of our proposed method.
- Research Article
- 10.1109/jstars.2025.3552817
- Jan 1, 2025
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
MLP-like models have shown strong potential in hyperspectral image (HSI) classification. However, their dense connections among all neurons (tokens) lead to large model sizes, high computational costs, and increased risk of overfitting. To address these issues, researchers have proposed sparse connectivity strategies to create more compact MLP models by selecting and mixing only a subset of tokens. However, most token selection rules overlook image patch content, often introducing task-irrelevant tokens with little valuable class distribution information. This problem is particularly severe in HSIs, which contain rich spatial and spectral information. To overcome this, we propose an adaptive token mixer (ATM) to effectively integrate spatial information in HSIs. ATM adaptively learns token positions based on their content, enabling the model to identify relevant tokens and capture global spatial information across the entire spatial domain. In addition, we introduce a cross-shaped convolutional operator (COSTCO) to enhance local spatial feature extraction. The combination of ATM and COSTCO enables comprehensive token mixing by integrating both global and local spatial information. Experimental results show that this proposed adaptive MLP focuses on the most informative, task-relevant regions during decision-making, offering interpretability to help users understand its predictions. Moreover, the adaptive MLP achieves state-of-the-art performance on HSI classification tasks across four publicly available datasets.
- Research Article
10
- 10.1109/jstars.2021.3123371
- Jan 1, 2021
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Over the past few years, convolutional neural network (CNN) has been broadly adopted in remote sensing (RS) imagery processing areas due to its impressive capabilities in feature extraction. Nevertheless, it is still a challenge for CNN-based hyperspectral image (HSI) classification methods to extract more effective spectral-spatial features considering all spectral bands. Driven by this issue, we propose a novel approach to cope with the HSI classification task, referring to the multi-level joint feature extraction network (MJFEN). The proposed network makes full use of the information on each channel of HSI and transforms it into valid channel-wised spatial features through a designed convolution process. Moreover, these feature maps form global attention details to guide the extraction of spectral-spatial features, which are taken to the next level for further feature mining. Then, the features obtained at different levels are integrated for ground object classification. In contrast with several state-of-the-art HSI classification methods on four public datasets, experimental results demonstrate the effectiveness and remarkable feature extraction capability of our proposed approach.
- Research Article
8
- 10.1080/01431161.2024.2370501
- Jul 5, 2024
- International Journal of Remote Sensing
Graph neural networks (GNNs) have recently garnered significant attention due to their exceptional performance across various applications, including hyperspectral (HS) image classification. However, most existing GNN-based models for HS image classification are limited depth models and often suffer from performance degradation as model depth increases. This study introduces HyperGCN, an exclusive GNN-based model designed with multiple graph convolutional layers to exploit the rich spectral information inherent in HS images, thereby enhancing classification performance. To address performance degradation, HyperGCN incorporates techniques resistant to oversmoothing into its architecture. Additionally, multiple-side exit branches are integrated into the intermediate layers of HyperGCN, enabling dynamic management of the complexity of HS images. Less complex HS images are processed by fewer layers, exiting early via attached branches, while more complex images traverse multiple layers until reaching the final output layer. Extensive experiments on four benchmark HS datasets (Indian Pines, Pavia University, Salinas, and Botswana) demonstrate HyperGCN’s superior performance over basic GNN-based models. Notably, HyperGCN outperforms or performs comparably to the CNN-GNN combined model in classifying HS images. Furthermore, the superior performance of multi-exit HyperGCN over its single-exit counterpart emphasizes the effectiveness of incorporating side exit branches in GNN-based HS image classification. Compared to state-of-the-art models, multi-exit HyperGCN demonstrates competitive performance, highlighting its effectiveness in handling complex spectral information in HS images while maintaining an acceptable balance between accuracy and computational efficiency.
- Research Article
51
- 10.1016/j.patrec.2018.08.032
- Aug 27, 2018
- Pattern Recognition Letters
A spatial-spectral SIFT for hyperspectral image matching and classification
- Research Article
16
- 10.1109/jstars.2024.3491294
- Jan 1, 2025
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Hyperspectral image (HSI) classification is a challenging task in remote sensing applications, aiming to determine the category of each pixel by utilizing rich spectral and spatial information in HSI. Convolutional neural networks (CNNs) have been effective in processing HSI data by extracting local features, but they are deficient in capturing global contextual information. Recently, transformer has become proficient in attending to global information due to their self-attention mechanisms, yet they may fall short in capturing multiscale features of HSI. To address these limitations, a global–local multigranularity transformer (GLMGT) network is proposed for HSI classification. The GLMGT combines CNN with the transformer to comprehensively capture multigranularity spectral and spatial features across global and local scales. Specifically, we introduce a multigranularity spatial feature extraction block to extensively extract spatial information at different granularities, including multiscale local spatial features and global spatial features. In addition, we introduce a multigranularity spectral feature extraction block to fully leverage spectral information across different granularities. The validity of the proposed method is demonstrated through experimental validation using seven publicly available datasets, which include two Chinese satellite hyperspectral datasets (ZY1-02D Huanghekou and GF-5 Yancheng) and one UAV-based hyperspectral dataset.
- Conference Article
7
- 10.1109/fskd.2017.8393336
- Jul 1, 2017
Hyperspectral Image (HSI) classification is one of the most persistent issue in remote sensing field. Recently, deep learning has attracted attention in HSI Classification field due to its accuracy and stronger generalization. This paper proposes a new spectral-spatial HSI classification approach developed on the deep learning concept of stacked-auto-encoders (SAE) based deep feature extraction and hidden Markov random field based segmentation. Specifically, First the SAE model is implemented as a spectral information-based classifier to extract the deep spectral features. Second, spatial information is obtained by using effective Hidden Markov random field (HMRF) based segmentation technique. Finally, maximum voting based criteria is employed to merge the extracted spectral and spatial information, which results in the precise spectral-spatial HSI classification. The characterization of the HSI with spectral spatial features results into more comprehensive analysis of HSI and to a more accurate classification. In general, use of spectral information resulted from the SAE process and spatial information by means of HMRF based segmentation and merging of spectral and spatial information by means of maximum voting based criteria, has a significant effect on the accuracy of the HSI classification. Experiments on real diverse hyperspectral data sets with different contexts and resolutions acquired by AVIRIS and ROSIS sensors show the accuracy of the proposed method and confirms that results of the proposed classification approach are comparable to several recently proposed HSI classification techniques.
- Research Article
69
- 10.3390/rs12122033
- Jun 24, 2020
- Remote Sensing
Accurate hyperspectral image classification has been an important yet challenging task for years. With the recent success of deep learning in various tasks, 2-dimensional (2D)/3-dimensional (3D) convolutional neural networks (CNNs) have been exploited to capture spectral or spatial information in hyperspectral images. On the other hand, few approaches make use of both spectral and spatial information simultaneously, which is critical to accurate hyperspectral image classification. This paper presents a novel Synergistic Convolutional Neural Network (SyCNN) for accurate hyperspectral image classification. The SyCNN consists of a hybrid module that combines 2D and 3D CNNs in feature learning and a data interaction module that fuses spectral and spatial hyperspectral information. Additionally, it introduces a 3D attention mechanism before the fully-connected layer which helps filter out interfering features and information effectively. Extensive experiments over three public benchmarking datasets show that our proposed SyCNNs clearly outperform state-of-the-art techniques that use 2D/3D CNNs.
- Research Article
2
- 10.1080/2150704x.2015.1034883
- Apr 3, 2015
- Remote Sensing Letters
In this article, a spatially constrained random walker approach is proposed for hyperspectral image (HSI) classification. This proposed method uses both spectral and spatial information. Image pixels are partitioned into two sets: a labelled set and an unlabelled set. The proposed method aims to label all the unlabelled pixels. The proposed technique consists of two steps. In the first step, random walker computes the posterior probability that an unlabelled pixel has the same label as a labelled pixel by using the spectral information. In order to improve the classification accuracy, Markov random fields is applied to account for spatial information in the second step. Evaluation of the developed method is done on HSIs. Experimental results are compared with those obtained using other HSI classification methods. The proposed approach performs better than other ones in terms of classification accuracy.
- Research Article
71
- 10.1016/j.patrec.2018.10.003
- Oct 5, 2018
- Pattern Recognition Letters
Joint spatial-spectral hyperspectral image classification based on convolutional neural network
- Research Article
10
- 10.1016/j.patrec.2023.12.023
- Jan 5, 2024
- Pattern Recognition Letters
SemanticFormer: Hyperspectral image classification via semantic transformer
- Conference Article
6
- 10.1109/igarss.2019.8898161
- Jul 1, 2019
Hyperspectral image is usually composed of hundreds of bands rich of spatial and spectral information. And this is an advantage for the common remotely sensed data. Thus, the classification of hyperspectral image could be of great value. However, the dimensionality of hyperspectral image may lead to the curse of dimensionality phenomenon when it is directly used for land use classification or other applications, making it difficult to be utilized effectively. In this paper, we presented a novel classification framework with capsule network based on the spectral and spatial information of hyperspectral images. At first, we use principal components analysis (PCA) to reduce the dimensionalities of hyperspectral image. Then, we use the capsule network to classify hyperspectral image. Our experimental result showed the novel classification framework is more efficient than other six popular methods. Therefore, the capsule network method is robust for hyperspectral image classification.