Residual Spectral–Spatial Attention Network for Hyperspectral Image Classification
In the last five years, deep learning has been introduced to tackle the hyperspectral image (HSI) classification and demonstrated good performance. In particular, the convolutional neural network (CNN)-based methods for HSI classification have made great progress. However, due to the high dimensionality of HSI and equal treatment of all bands, the performance of these methods is hampered by learning features from useless bands for classification. Moreover, for patchwise-based CNN models, equal treatment of spatial information from the pixel-centered neighborhood also hinders the performance of these methods. In this article, we propose an end-to-end residual spectral-spatial attention network (RSSAN) for HSI classification. The RSSAN takes raw 3-D cubes as input data without additional feature engineering. First, a spectral attention module is designed for spectral band selection from raw input data by emphasizing useful bands for classification and suppressing useless bands. Then, a spatial attention module is designed for the adaptive selection of spatial information by emphasizing pixels from the same class as the center pixel or those are useful for classification in the pixel-centered neighborhood and suppressing those from a different class or useless. Second, two attention modules are also used in the following CNN for adaptive feature refinement in spectral-spatial feature learning. Third, a sequential spectral-spatial attention module is embedded into a residual block to avoid overfitting and accelerate the training of the proposed model. Experimental studies demonstrate that the RSSAN achieved superior classification accuracy compared with the state of the art on three HSI data sets: Indian Pines (IN), University of Pavia (UP), and Kennedy Space Center (KSC).
- Research Article
32
- 10.3390/rs12122035
- Jun 24, 2020
- Remote Sensing
Recently, deep learning methods based on three-dimensional (3-D) convolution have been widely used in the hyperspectral image (HSI) classification tasks and shown good classification performance. However, affected by the irregular distribution of various classes in HSI datasets, most previous 3-D convolutional neural network (CNN)-based models require more training samples to obtain better classification accuracies. In addition, as the network deepens, which leads to the spatial resolution of feature maps gradually decreasing, much useful information may be lost during the training process. Therefore, how to ensure efficient network training is key to the HSI classification tasks. To address the issue mentioned above, in this paper, we proposed a 3-DCNN-based residual group channel and space attention network (RGCSA) for HSI classification. Firstly, the proposed bottom-up top-down attention structure with the residual connection can improve network training efficiency by optimizing channel-wise and spatial-wise features throughout the whole training process. Secondly, the proposed residual group channel-wise attention module can reduce the possibility of losing useful information, and the novel spatial-wise attention module can extract context information to strengthen the spatial features. Furthermore, our proposed RGCSA network only needs few training samples to achieve higher classification accuracies than previous 3-D-CNN-based networks. The experimental results on three commonly used HSI datasets demonstrate the superiority of our proposed network based on the attention mechanism and the effectiveness of the proposed channel-wise and spatial-wise attention modules for HSI classification. The code and configurations are released at Github.com.
- Research Article
12
- 10.1080/01431161.2023.2249598
- Sep 8, 2023
- International Journal of Remote Sensing
Convolutional Neural Network (CNN) has developed hyperspectral image (HSI) classification effectively. Although many CNN-based models can extract local features in HSI, it is difficult for them to extract global features. With its ability to capture long-range dependencies, Transformer is gradually gaining prominence in HSI classification, but it may overlook some local details when extracting features. To address these issues, we proposed a CNN and transformer interaction network (CTIN) for HSI classification. Firstly, A dual-branch structure was constructed in which CNN and Transformer are arranged in parallel to simultaneously extract global features and local features in HSI. Secondly, a feature interaction module has been imported between the two branches, thus facilitating a bi-directional flow of information between the global and local feature spaces. In this way, the network structure combines the advantages of CNN and Transformer in extracting features as much as possible. In addition, a token generation method is designed to harness abundant contextual information that is relevant to the centre pixel, and improve the accuracy of the final classification. Experiments were conducted on four hyperspectral datasets (two classical datasets – Indian Pines, Salinas Valley, a new satellite dataset – Yellow River, and an self-made UAV dataset-Yellow River Willow). Experimental results show that the proposed method outperforms the other state-of-the-art methods, with overall accuracies of 99.21%, 99.61%, 92.40%, and 98.17%, respectively.
- Research Article
- 10.1080/01431161.2024.2398822
- Oct 3, 2024
- International Journal of Remote Sensing
Recently, the convolutional neural network (CNN) has gained widespread adoption in the hyperspectral image (HSI) classification owing to its remarkable feature extraction capability. However, the fixed acceptance domain of CNN restricts it to Euclidean image data only, making it difficult to capture complex information in hyperspectral data. To overcome this problem, much attention has been paid to the graph attention network (GAT), which can effectively model graph structure and capture complex dependencies between nodes. However, GAT usually acts on superpixel nodes, which may lead to the loss of pixel-level information. To better integrate the advantages of both, we propose a CNN-assisted multi-hop graph attention network (CMGAT) for HSI classification. Specifically, a parallel dual-branch architecture is first constructed to simultaneously capture spectral-spatial features from hyperspectral data at the superpixel and pixel levels using GAT and CNN, respectively. On this basis, the multi-hop and multi-scale mechanisms are further employed to construct a multi-hop GAT module and a multi-scale CNN module to capture diverse feature information. Secondly, an attention module is cascaded before the multi-scale CNN module to improve classification performance. Eventually, the output information from the two branches is weighted and fused to produce the classification result. We performed experiments on four benchmark HSI datasets, including Indian Pines (IP), University of Pavia (UP), Salinas Valley (SV) and WHU-Hi-LongKou (LK). The results demonstrate that the proposed method outperforms several deep learning methods, achieving overall accuracies of 95.67%, 99.04%, 99.55% and 99.51%, respectively, even with fewer training samples.
- Research Article
19
- 10.1109/jstars.2022.3197934
- Jan 1, 2022
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Deep learning has achieved good performance in hyperspectral image classification (HSIC). Many methods based on deep learning use deep and complex network structures to extract rich spectral and spatial features of hyperspectral images (HSIs) with high accuracy. During the process, how to accurately extract the features and information from pixel blocks in HSIs is important. All of the spectral features are treated equally in classification, and the input of the network often contains much useless pixel information, leading to a low classification result. To solve this problem, an enhanced spectral-spatial residual attention network (ESSRAN) is proposed for HSIC in this paper. In the proposed network, the spectral-spatial attention network (SSAN), residual network (ResNet) and long-short term memory (LSTM) are combined to extract more discriminative spectral and spatial features. More specifically, SSAN is first applied to extract image features by using the spectral attention module to emphasize useful bands and suppress useless bands. The spatial attention module is used to emphasize pixels that have same category with the central pixel. Then, these obtained features are fed into an improved ResNet, which adopts LSTM to learn representative high-level semantic features of the spectral sequences, since the use of ResNet can prevent gradient disappearance and explosion. The proposed ESSRAN model is implemented on three commonly used HSI datasets and compared to some state-of-the-art methods. The results confirm that ESSRAN effectively improves accuracy.
- Research Article
22
- 10.1109/access.2022.3144393
- Jan 1, 2022
- IEEE Access
Hyperspectral image (HSI) classification has become a research hotspot. Recently, deep learning-based methods have achieved preferable performances by which the deep spectral-spatial features can be extracted from HSI cubes. However, in complex scenes, due to the diversity of the types of land-cover and the bands in high dimensional, these methods are often hampered by the irrelevant spatial areas and the redundant bands, which results in the indistinguishable features and the restricted performance. In this article, a spatial attention guided residual attention network (SpaAG-RAN) is proposed for HSI classification, which contains a spatial attention module (SpaAM), a spectral attention module (SpeAM), and a spectral-spatial feature extraction module (SSFEM). Based on the spectral similarity, the SpaAM is capable of capturing the relevant spatial areas composed of the pixels of the same category as the center pixel from HSI cube with a novel inverted-shifted-scaled sigmoid activation function. The SpeAM aims to select the bands which are beneficial to the spectral features representation. The SSFEM is exploited to extract the discriminating spectral-spatial features. To facilitate the processes of bands selection and features extraction, two well-designed spatial attention masks generated by the SpaAM are employed to guide the works of the SpeAM and the SSFEM, respectively. Moreover, a spatial consistency loss function is installed to maintain the consistency between the two spatial attention masks so that the network enables the distinction of the relevant features exactly. Experimental results on three HSI data sets show that the proposed SpaAG-RAN model can extract the discriminating spectral-spatial features and outperforms the state-of-the-arts.
- Research Article
3
- 10.3390/rs15174219
- Aug 28, 2023
- Remote Sensing
Hyperspectral image (HSI) classification is a vital task in hyperspectral image processing and applications. Convolutional neural networks (CNN) are becoming an effective approach for categorizing hyperspectral remote sensing images as deep learning technology advances. However, traditional CNN usually uses a fixed kernel size, which limits the model’s capacity to acquire new features and affects the classification accuracy. Based on this, we developed a spectral segmentation-based multi-scale spatial feature extraction residual network (MFERN) for hyperspectral image classification. MFERN divides the input data into many non-overlapping sub-bands by spectral bands, extracts features in parallel using the multi-scale spatial feature extraction module MSFE, and adds global branches on top of this to obtain global information of the full spectral band of the image. Finally, the extracted features are fused and sent into the classifier. Our MSFE module has multiple branches with increasing ranges of the receptive field (RF), enabling multi-scale spatial information extraction at both fine- and coarse-grained levels. On the Indian Pines (IP), Salinas (SA), and Pavia University (PU) HSI datasets, we conducted extensive experiments. The experimental results show that our model has the best performance and robustness, and our proposed MFERN significantly outperforms other models in terms of classification accuracy, even with a small amount of training data.
- Research Article
24
- 10.3390/rs13163055
- Aug 4, 2021
- Remote Sensing
Convolutional neural networks (CNNs) have achieved great results in hyperspectral image (HSI) classification in recent years. However, convolution kernels are reused among different spatial locations, known as spatial-agnostic or weight-sharing kernels. Furthermore, the preference of spatial compactness in convolution (typically, 3×3 kernel size) constrains the receptive field and the ability to capture long-range spatial interactions. To mitigate the above two issues, in this article, we combine a novel operation called involution with residual learning and develop a new deep residual involution network (DRIN) for HSI classification. The proposed DRIN could model long-range spatial interactions well by adopting enlarged involution kernels and realize feature learning in a fairly lightweight manner. Moreover, the vast and dynamic involution kernels are distinct over different spatial positions, which could prioritize the informative visual patterns in the spatial domain according to the spectral information of the target pixel. The proposed DRIN achieves better classification results when compared with both traditional machine learning-based and convolution-based methods on four HSI datasets. Especially in comparison with the convolutional baseline model, i.e., deep residual network (DRN), our involution-powered DRIN model increases the overall classification accuracy by 0.5%, 1.3%, 0.4%, and 2.3% on the University of Pavia, the University of Houston, the Salinas Valley, and the recently released HyRANK HSI benchmark datasets, respectively, demonstrating the potential of involution for HSI classification.
- Research Article
7
- 10.1109/access.2023.3253627
- Jan 1, 2023
- IEEE Access
Over the past few years, deep learning has been introduced to tackle hyperspectral image (HSI) classification and demonstrated good performance. In particular, the convolutional neural network (CNN) based methods have progressed. However, due to the high dimensionality of HSI and equal treatment of all bands, the performances of CNN based methods are hampered. The labels of land-covers often differ between edge and the center pixels in pixel-centered spatial information. These edge pixels may weaken the discrimination of spatial features and reduce classification accuracy. Motivated by the attention mechanism of the human visual system, the spatial proximity feature selection with residual spatial–spectral attention network is proposed in this article. It contains a residual spatial attention module, a residual spectral attention module, and a spatial proximity feature selection module. The residual spatial attention module aims to select the crucial spatial information, which assigns weights to different features by measuring the similarity between the surrounding elements and their central ones. The residual spectral attention module is designed for spectral bands which are selected from raw input data by emphasizing the valuable bands and suppressing the valueless. According to the spatial distribution of features, the spatial proximity feature selection module is used to filter features effectively. Experiments on three public data sets demonstrate that the proposed network outperforms the state-of-the-art methods in comparison.
- Research Article
2
- 10.3390/rs16224202
- Nov 11, 2024
- Remote Sensing
Hyperspectral image (HSI) classification is a crucial technique that assigns each pixel in an image to a specific land cover category by leveraging both spectral and spatial information. In recent years, HSI classification methods based on convolutional neural networks (CNNs) and Transformers have significantly improved performance due to their strong feature extraction capabilities. However, these improvements often come with increased model complexity, leading to higher computational costs. To address this, we propose a compact and efficient spectral-spatial feature extraction and attention-based neural network (SSFAN) for HSI classification. The SSFAN model consists of three core modules: the Parallel Spectral-Spatial Feature Extraction Block (PSSB), the Scan Block, and the Squeeze-and-Excitation MLP Block (SEMB). After preprocessing the HSI data, it is fed into the PSSB module, which contains two parallel streams, each comprising a 3D convolutional layer and a 2D convolutional layer. The 3D convolutional layer extracts spectral and spatial features from the input hyperspectral data, while the 2D convolutional layer further enhances the spatial feature representation. Next, the Scan Block module employs a layered scanning strategy to extract spatial information at different scales from the central pixel outward, enabling the model to capture both local and global spatial relationships. The SEMB module combines the Spectral-Spatial Recurrent Block (SSRB) and the MLP Block. The SSRB, with its adaptive weight assignment mechanism in the SToken Module, flexibly handles time steps and feature dimensions, performing deep spectral and spatial feature extraction through multiple state updates. Finally, the MLP Block processes the input features through a series of linear transformations, GELU activation functions, and Dropout layers, capturing complex patterns and relationships within the data, and concludes with an argmax layer for classification. Experimental results show that the proposed SSFAN model delivers superior classification performance, outperforming the second-best method by 1.72%, 5.19%, and 1.94% in OA, AA, and Kappa coefficient, respectively, on the Indian Pines dataset. Additionally, it requires less training and testing time compared to other state-of-the-art deep learning methods.
- Research Article
9
- 10.1080/01431161.2021.2005840
- Feb 1, 2022
- International Journal of Remote Sensing
Traditional hyperspectral image (HSI) classification methods mainly include machine learning and convolutional neural network. However, they extremely depend on the large training samples. To obtain high accuracy on limited training samples, we propose a novel end-to-end spectral-spatial multi-scale network (SSMSN) for HSI classification. The SSMSN uses the multi-scale spectral module and the multi-scale spatial module to extract discriminative multi-scale spectral and multi-scale spatial features separately. In the multi-scale spectral module and spatial module, the multi-scale Res2Net block structure can learn multi-scale features at a granular level and increase the range of receptive fields by constructing the hierarchical residual-like connection within one single residual block. To alleviate the overfitting problem and further improve the classification accuracy on limited training samples, we adopt a simple but effective hinge cross-entropy loss function to train the SSMSN at the dynamic learning rate. A large number of experimental results demonstrate that on the Indiana Pines, University of Pavia, Kennedy Space Center, and Salinas Scene data sets, the proposed SSMSN achieves higher classification accuracy than state-of-the-art methods on limited training samples. Meanwhile, our SSMSN obtains less training and testing time than the popular AUSSC method.
- Research Article
2
- 10.1364/josaa.478585
- Feb 21, 2023
- Journal of the Optical Society of America A
In recent years, generative adversarial networks (GNAs), consisting of two competing 2D convolutional neural networks (CNNs) that are used as a generator and a discriminator, have shown their promising capabilities in hyperspectral image (HSI) classification tasks. Essentially, the performance of HSI classification lies in the feature extraction ability of both spectral and spatial information. The 3D CNN has excellent advantages in simultaneously mining the above two types of features but has rarely been used due to its high computational complexity. This paper proposes a hybrid spatial-spectral generative adversarial network (HSSGAN) for effective HSI classification. The hybrid CNN structure is developed for the construction of the generator and the discriminator. For the discriminator, the 3D CNN is utilized to extract the multi-band spatial-spectral feature, and then we use the 2D CNN to further represent the spatial information. To reduce the accuracy loss caused by information redundancy, a channel and spatial attention mechanism (CSAM) is specially designed. To be specific, a channel attention mechanism is exploited to enhance the discriminative spectral features. Furthermore, the spatial self-attention mechanism is developed to learn the long-term spatial similarity, which can effectively suppress invalid spatial features. Both quantitative and qualitative experiments implemented on four widely used hyperspectral datasets show that the proposed HSSGAN has a satisfactory classification effect compared to conventional methods, especially with few training samples.
- Research Article
8
- 10.1109/lgrs.2022.3171536
- Jan 1, 2022
- IEEE Geoscience and Remote Sensing Letters
Recently, graph convolutional network (GCN) has drawn increasing attention in hyperspectral image (HSI) classification, as it can process arbitrary non-Euclidean data. However, dynamic GCN that refines the graph heavily relies on the graph embedding in the previous layer, which will result in performance degradation when the embedding contains noise. In this letter, we propose a novel dual residual graph convolutional network (DRGCN) for HSI classification that integrates two adjacency matrices of dual GCN. In detail, one GCN applies a soft adjacency matrix to extract spatial features, the other utilizes the dynamic adjacency matrix to extract global context-aware features. Subsequently, the features extracted by dual GCN are fused to make full use of the complementary and correlated information among two graph representations. Moreover, we introduce residual learning to optimize graph convolutional layers during the training process, to alleviate the over-smoothing problem. The advantage of dual GCN is that it can extract robust and discriminative features from HSI. Extensive experiments on four HSI data sets, including Indian Pines, Pavia University, Salinas, and Houston University, demonstrate the effectiveness and superiority of our proposed DRGCN, even with small-sized training data.
- Research Article
18
- 10.1109/tgrs.2021.3075546
- May 7, 2021
- IEEE Transactions on Geoscience and Remote Sensing
Convolutional neural networks (CNNs), a kind of feedforward neural network with a deep structure, are one of the representative methods in hyperspectral image (HSI) classification. However, redundant information and interclass interference are common and challenging problems in HSI classification. In addition, if the spectral and spatial information is not properly extracted and analyzed, it will affect the classification performance of the network to a great extent. Aiming at these issues, this article proposes an HSI classification method based on an adaptive hash attention mechanism and a lower triangular network (AHA-LT). First, the attention mechanism is introduced in the preprocessing stage, which is composed of the spectral attention module and the adaptive hash spatial attention module in series. Then, the data processed by the attention mechanism are introduced into the lower triangular network (LTNet) to obtain the fused high-dimensional semantic features. Finally, we compress the features and obtain the output classification results through several fully connected layers. Among them, LTNet is composed of 2-D–3-D CNN and multiscale features. The network integrates the characteristics of multibranch, feature fusion, feature compression, and skip connections. Extensive experiments on four widely used HSI data sets show that the proposed method can obtain a great improvement in performance compared with the existing methods.
- Conference Article
- 10.1109/icicsp55539.2022.10050698
- Nov 26, 2022
Hyperspectral image (HSI) classification is the key technology of remote sensing image processing. In recent years, convolutional neural network (CNN), which is a powerful feature extractor, has been introduced into the field of HSI classification. Since the features of HSI are the basis of HSI classification, how to effectively extract the spectral-spatial features from HSI with CNN has become a research hotspot. The HSI feature extraction network, based on two-dimensional (2D) and three-dimensional (3D) CNN which can extract both spectral and spatial information, may lead to the increase of parameters and computational cost. Compared with 2D CNN and 3D CNN, the number of parameters and computational cost of one-dimensional (1D) CNN will be greatly reduced. However, 1D CNN based algorithms can only extract the spectral information without considering the spatial information. Therefore, in this paper, a lightweight multilevel feature fusion network (LMFFN) is proposed for HSI classification, which aims to achieve efficient extraction of spectral-spatial features and to minimize the number of parameters. The main contributions of this paper are divided into the following two points: First, we design a hybrid spectral-spatial feature extraction network (HSSFEN) to combine the advantages of 1D, 2D and 3D CNN. It introduces the idea of depthwise separable convolution method, which effectively reduces the complexity of the proposed HSSFEN. Then, a multilevel spectral-spatial feature fusion network (MSSFFN) is proposed to further obtain more effective spectral-spatial features, which effectively fuses the bottom spectral-spatial features and the top spectral-spatial features. To demonstrate the performance of our proposed method, a series of experiments are conducted on three HSI datasets, including Indian Pine, University of Pavia, and Salinas Scene datasets. The experimental results indicate that our proposed LMFFN is able to achieve better performance than the manual feature extraction methods and deep learning methods, which demonstrates the superiority of our proposed method.
- Conference Article
1
- 10.1109/igarss46834.2022.9883452
- Jul 17, 2022
In hyperspectral image (HSI) classification task, effectively deriving and incorporating spatial information into spectral features is one of a key focus as it can largely influence the performance. Markov random fields (MRFs) are generative and flexible image texture models, and capable of effectively extracting spatial neighbourhood information along multiple spectral wavebands in an unsupervised way. Its parameter estimation process also shares strong compatibility with deep architecture, especially the convolutional neural networks. In this work, we propose an MRF based spectral-spatial fusion network (SSFNet) for HSI classification. Spatial features are extracted using MRF models and further fused with spectral information. Then the proposed SSFNet takes the fused features as input and produces reliable classification results. Comprehensive experiments conducted on the Indian pines and the Pavia university datasets are reported to verify the proposed method.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.