CNN and Transformer interaction network for hyperspectral image classification

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT Convolutional Neural Network (CNN) has developed hyperspectral image (HSI) classification effectively. Although many CNN-based models can extract local features in HSI, it is difficult for them to extract global features. With its ability to capture long-range dependencies, Transformer is gradually gaining prominence in HSI classification, but it may overlook some local details when extracting features. To address these issues, we proposed a CNN and transformer interaction network (CTIN) for HSI classification. Firstly, A dual-branch structure was constructed in which CNN and Transformer are arranged in parallel to simultaneously extract global features and local features in HSI. Secondly, a feature interaction module has been imported between the two branches, thus facilitating a bi-directional flow of information between the global and local feature spaces. In this way, the network structure combines the advantages of CNN and Transformer in extracting features as much as possible. In addition, a token generation method is designed to harness abundant contextual information that is relevant to the centre pixel, and improve the accuracy of the final classification. Experiments were conducted on four hyperspectral datasets (two classical datasets – Indian Pines, Salinas Valley, a new satellite dataset – Yellow River, and an self-made UAV dataset-Yellow River Willow). Experimental results show that the proposed method outperforms the other state-of-the-art methods, with overall accuracies of 99.21%, 99.61%, 92.40%, and 98.17%, respectively.

Similar Papers
  • Research Article
  • Cite Count Icon 10
  • 10.1109/tgrs.2023.3282247
A Lightweight Hybrid Convolutional Neural Network for Hyperspectral Image Classification
  • Jan 1, 2023
  • IEEE Transactions on Geoscience and Remote Sensing
  • Xiaohu Ma + 6 more

Recent studies have demonstrated the potential of hybrid convolutional models that combine 3D and 2D convolutional neural networks (CNNs) for hyperspectral image (HSI) classification. However, these models do not fully utilize the benefits of hybrid convolution due to inefficient connections between the two types of CNNs. Moreover, most CNNs, including hybrid models, require a significant number of parameters and computational resources for accurate classification, which increases the need for labeled samples and computational cost. Although the common lightweight strategies like depthwise separable convolution (DSC) can reduce parameters and computation compared to normal convolution (NC), they often compromise accuracy. To address these challenges, we propose a lightweight hybrid convolutional neural network (Lite-HCNet) for HSI classification with minimal model parameters and computational effort. Firstly, we design a novel channel attention module (NCAM) and combine it with a convolutional kernel decomposition (CKD) strategy to propose a lightweight and efficient DSC (LE-DSC) deployed in Lite-HCNet. The LE-DSC not only reduces the DSC volume further but also enhances its performance. Secondly, a lightweight and efficient hybrid convolutional layer (LE-HCL) is designed in Lite-HCNet to explore the efficient connection structure between 3D CNNs and 2D CNNs. Experiments show that the Lite-HCNet reduces the required computational cost and practical deployment difficulty while offering advanced performance with a small number of training samples. Furthermore, abundant ablation experiments confirm the superior performance of the designed LE-DSC.

  • Research Article
  • Cite Count Icon 393
  • 10.1109/tgrs.2020.2994057
Residual Spectral–Spatial Attention Network for Hyperspectral Image Classification
  • May 28, 2020
  • IEEE Transactions on Geoscience and Remote Sensing
  • Minghao Zhu + 4 more

In the last five years, deep learning has been introduced to tackle the hyperspectral image (HSI) classification and demonstrated good performance. In particular, the convolutional neural network (CNN)-based methods for HSI classification have made great progress. However, due to the high dimensionality of HSI and equal treatment of all bands, the performance of these methods is hampered by learning features from useless bands for classification. Moreover, for patchwise-based CNN models, equal treatment of spatial information from the pixel-centered neighborhood also hinders the performance of these methods. In this article, we propose an end-to-end residual spectral-spatial attention network (RSSAN) for HSI classification. The RSSAN takes raw 3-D cubes as input data without additional feature engineering. First, a spectral attention module is designed for spectral band selection from raw input data by emphasizing useful bands for classification and suppressing useless bands. Then, a spatial attention module is designed for the adaptive selection of spatial information by emphasizing pixels from the same class as the center pixel or those are useful for classification in the pixel-centered neighborhood and suppressing those from a different class or useless. Second, two attention modules are also used in the following CNN for adaptive feature refinement in spectral-spatial feature learning. Third, a sequential spectral-spatial attention module is embedded into a residual block to avoid overfitting and accelerate the training of the proposed model. Experimental studies demonstrate that the RSSAN achieved superior classification accuracy compared with the state of the art on three HSI data sets: Indian Pines (IN), University of Pavia (UP), and Kennedy Space Center (KSC).

  • Research Article
  • Cite Count Icon 2
  • 10.3390/rs16224202
SSFAN: A Compact and Efficient Spectral-Spatial Feature Extraction and Attention-Based Neural Network for Hyperspectral Image Classification
  • Nov 11, 2024
  • Remote Sensing
  • Chunyang Wang + 6 more

Hyperspectral image (HSI) classification is a crucial technique that assigns each pixel in an image to a specific land cover category by leveraging both spectral and spatial information. In recent years, HSI classification methods based on convolutional neural networks (CNNs) and Transformers have significantly improved performance due to their strong feature extraction capabilities. However, these improvements often come with increased model complexity, leading to higher computational costs. To address this, we propose a compact and efficient spectral-spatial feature extraction and attention-based neural network (SSFAN) for HSI classification. The SSFAN model consists of three core modules: the Parallel Spectral-Spatial Feature Extraction Block (PSSB), the Scan Block, and the Squeeze-and-Excitation MLP Block (SEMB). After preprocessing the HSI data, it is fed into the PSSB module, which contains two parallel streams, each comprising a 3D convolutional layer and a 2D convolutional layer. The 3D convolutional layer extracts spectral and spatial features from the input hyperspectral data, while the 2D convolutional layer further enhances the spatial feature representation. Next, the Scan Block module employs a layered scanning strategy to extract spatial information at different scales from the central pixel outward, enabling the model to capture both local and global spatial relationships. The SEMB module combines the Spectral-Spatial Recurrent Block (SSRB) and the MLP Block. The SSRB, with its adaptive weight assignment mechanism in the SToken Module, flexibly handles time steps and feature dimensions, performing deep spectral and spatial feature extraction through multiple state updates. Finally, the MLP Block processes the input features through a series of linear transformations, GELU activation functions, and Dropout layers, capturing complex patterns and relationships within the data, and concludes with an argmax layer for classification. Experimental results show that the proposed SSFAN model delivers superior classification performance, outperforming the second-best method by 1.72%, 5.19%, and 1.94% in OA, AA, and Kappa coefficient, respectively, on the Indian Pines dataset. Additionally, it requires less training and testing time compared to other state-of-the-art deep learning methods.

  • Research Article
  • Cite Count Icon 2
  • 10.1364/josaa.478585
Hybrid spatial-spectral generative adversarial network for hyperspectral image classification.
  • Feb 21, 2023
  • Journal of the Optical Society of America A
  • Chao Ma + 5 more

In recent years, generative adversarial networks (GNAs), consisting of two competing 2D convolutional neural networks (CNNs) that are used as a generator and a discriminator, have shown their promising capabilities in hyperspectral image (HSI) classification tasks. Essentially, the performance of HSI classification lies in the feature extraction ability of both spectral and spatial information. The 3D CNN has excellent advantages in simultaneously mining the above two types of features but has rarely been used due to its high computational complexity. This paper proposes a hybrid spatial-spectral generative adversarial network (HSSGAN) for effective HSI classification. The hybrid CNN structure is developed for the construction of the generator and the discriminator. For the discriminator, the 3D CNN is utilized to extract the multi-band spatial-spectral feature, and then we use the 2D CNN to further represent the spatial information. To reduce the accuracy loss caused by information redundancy, a channel and spatial attention mechanism (CSAM) is specially designed. To be specific, a channel attention mechanism is exploited to enhance the discriminative spectral features. Furthermore, the spatial self-attention mechanism is developed to learn the long-term spatial similarity, which can effectively suppress invalid spatial features. Both quantitative and qualitative experiments implemented on four widely used hyperspectral datasets show that the proposed HSSGAN has a satisfactory classification effect compared to conventional methods, especially with few training samples.

  • Research Article
  • Cite Count Icon 1
  • 10.14358/pers.22-00111r2
Lightweight Parallel Octave Convolutional Neural Network for Hyperspectral Image Classification
  • Apr 1, 2023
  • Photogrammetric Engineering & Remote Sensing
  • Dan Li + 5 more

Although most deep learning-based methods have achieved excellent performance for hyperspectral image (HSI) classification, they are often limited by complex networks and require massive training samples in practical applications. Therefore, designing an efficient, lightweight model to obtain better classification results under small samples situations remains a challenging task. To alleviate this problem, a novel, lightweight parallel octave convolutional neural network (LPOCNN) for HSI classification is proposed in this paper. First, the HSI data is preprocessed to construct two three-dimensional (3D) patch cubes with different spatial and spectral scales for each central pixel, removing redundancy and focusing on extracting spatial features and spectral features, respectively. Next, two non- deep parallel branches are created for the two inputs, which design octave convolution rather than classical 3D convolution to facilitate light weighting of the model. Then two-dimensional convolutional neural network is used to extract deeper spectral-spatial features when fusing spectral-spatial features from different parallel layers. Moreover, the spectral-spatial attention is designed to promote the classification performance even further by adaptively adjusting the weights of different spectral-spatial features according to their contribution to classification. Experiments show that our suggested LPOCNN acquires a significant advantage on classification performance over other competitive methods under small sample situations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 32
  • 10.3390/rs12122035
Residual Group Channel and Space Attention Network for Hyperspectral Image Classification
  • Jun 24, 2020
  • Remote Sensing
  • Peida Wu + 3 more

Recently, deep learning methods based on three-dimensional (3-D) convolution have been widely used in the hyperspectral image (HSI) classification tasks and shown good classification performance. However, affected by the irregular distribution of various classes in HSI datasets, most previous 3-D convolutional neural network (CNN)-based models require more training samples to obtain better classification accuracies. In addition, as the network deepens, which leads to the spatial resolution of feature maps gradually decreasing, much useful information may be lost during the training process. Therefore, how to ensure efficient network training is key to the HSI classification tasks. To address the issue mentioned above, in this paper, we proposed a 3-DCNN-based residual group channel and space attention network (RGCSA) for HSI classification. Firstly, the proposed bottom-up top-down attention structure with the residual connection can improve network training efficiency by optimizing channel-wise and spatial-wise features throughout the whole training process. Secondly, the proposed residual group channel-wise attention module can reduce the possibility of losing useful information, and the novel spatial-wise attention module can extract context information to strengthen the spatial features. Furthermore, our proposed RGCSA network only needs few training samples to achieve higher classification accuracies than previous 3-D-CNN-based networks. The experimental results on three commonly used HSI datasets demonstrate the superiority of our proposed network based on the attention mechanism and the effectiveness of the proposed channel-wise and spatial-wise attention modules for HSI classification. The code and configurations are released at Github.com.

  • Research Article
  • 10.1080/01431161.2025.2457130
Bridging branches and attributes: spectral-spatial global-local interaction network for hyperspectral image classification
  • Feb 7, 2025
  • International Journal of Remote Sensing
  • Leiquan Wang + 7 more

The CNN-Transformer joint model stands as the leading architecture for contemporary hyperspectral image (HSI) classification, integrating global and local features through either successive or dual-branch CNN and Transformer networks. However, these methods often fall short in effectively incorporating spatial-spectral information with local-global attributes, resulting in incomplete feature representation. To address these challenges, we propose a spectral-spatial global-local interaction network that transmits global and local features into the spatial and spectral branches, facilitated by cross-interaction operators to ensure adequate feature flow. Initially, CNNs are employed to separately extract shallow features for the spectral and spatial branches. We then introduce a Spectral-Spatial Global-Local Interaction block designed for deep feature extraction, enhancing the flow of spectral and spatial features with local and global attributes using parallel transformers and dynamic convolutions. Transformers model the long-range dependencies of global spectral and spatial features, while dynamic convolutions enhance the context sensitivity of local spectral and spatial representations. Quadruple cross-interaction blocks are proposed to traverse both the spectral-spatial branches and local-global attribute dimensions, facilitating information exchange for complementary HSI representation. Extensive experiments and ablation studies on four public HSI datasets demonstrate the superiority of our proposed method. convolutional neural networks (CNNs); spectral and spatial; global and local; hyperspectral image classification; cross-interaction.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/rs15174219
Spectral Segmentation Multi-Scale Feature Extraction Residual Networks for Hyperspectral Image Classification
  • Aug 28, 2023
  • Remote Sensing
  • Jiamei Wang + 3 more

Hyperspectral image (HSI) classification is a vital task in hyperspectral image processing and applications. Convolutional neural networks (CNN) are becoming an effective approach for categorizing hyperspectral remote sensing images as deep learning technology advances. However, traditional CNN usually uses a fixed kernel size, which limits the model’s capacity to acquire new features and affects the classification accuracy. Based on this, we developed a spectral segmentation-based multi-scale spatial feature extraction residual network (MFERN) for hyperspectral image classification. MFERN divides the input data into many non-overlapping sub-bands by spectral bands, extracts features in parallel using the multi-scale spatial feature extraction module MSFE, and adds global branches on top of this to obtain global information of the full spectral band of the image. Finally, the extracted features are fused and sent into the classifier. Our MSFE module has multiple branches with increasing ranges of the receptive field (RF), enabling multi-scale spatial information extraction at both fine- and coarse-grained levels. On the Indian Pines (IP), Salinas (SA), and Pavia University (PU) HSI datasets, we conducted extensive experiments. The experimental results show that our model has the best performance and robustness, and our proposed MFERN significantly outperforms other models in terms of classification accuracy, even with a small amount of training data.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/igarss46834.2022.9883452
Markov Random Field Based Spectral-Spatial Fusion Network for Hyperspectral Image Classification
  • Jul 17, 2022
  • Yao Peng + 1 more

In hyperspectral image (HSI) classification task, effectively deriving and incorporating spatial information into spectral features is one of a key focus as it can largely influence the performance. Markov random fields (MRFs) are generative and flexible image texture models, and capable of effectively extracting spatial neighbourhood information along multiple spectral wavebands in an unsupervised way. Its parameter estimation process also shares strong compatibility with deep architecture, especially the convolutional neural networks. In this work, we propose an MRF based spectral-spatial fusion network (SSFNet) for HSI classification. Spatial features are extracted using MRF models and further fused with spectral information. Then the proposed SSFNet takes the fused features as input and produces reliable classification results. Comprehensive experiments conducted on the Indian pines and the Pavia university datasets are reported to verify the proposed method.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/electronics14112234
FSFF-Net: A Frequency-Domain Feature and Spatial-Domain Feature Fusion Network for Hyperspectral Image Classification
  • May 30, 2025
  • Electronics
  • Xinyu Pan + 4 more

In hyperspectral image (HSI) classification, each pixel is assigned to a specific land cover type, which is critical for applications in environmental monitoring, agriculture, and urban planning. Convolutional neural network (CNN) and Transformers have become widely adopted due to their exceptional feature extraction capabilities. However, the local receptive field of CNN limits their ability to capture global context, while Transformers, though effective in modeling long-range dependencies, introduce computational overhead. To address these challenges, we propose a frequency-domain and spatial-domain feature fusion network (FSFF-Net) for HSI classification, which reduces computational complexity while capturing global features. The FSFF-Net consists of a frequency-domain transformer (FDformer) and a deepwise convolution-based parallel encoder structure. The FDformer replaces the self-attention mechanism in traditional Visual Transformers with a three-step process: two-dimensional discrete Fourier transform (2D-DFT), adaptive filter, and two-dimensional inverse Fourier transform (2D-IDFT). 2D DFT and 2D-IDFT convert images between the spatial and frequency domains. Adaptive filter can adaptively retain important frequency components, remove redundant components, and assign weights to different frequency components. This module not only can reduce computational overhead by decreasing the number of parameters, but also mitigates the limitations of CNN by capturing complementary frequency-domain features, which enhance the spatial-domain features for improved classification. In parallel, deepwise convolution is employed to capture spatial-domain features. The network then integrates the frequency-domain features from FDformer and the spatial-domain features from deepwise convolution through a feature fusion module. The experimental results demonstrate that our method is efficient and robust for HSIs classification, achieving overall accuracies of 98.03%, 99.57%, 97.05%, and 98.40% on Indian Pines, Pavia University, Salinas, and Houston 2013 datasets, respectively.

  • Research Article
  • 10.1080/01431161.2024.2398822
Cnn-assisted multi-hop graph attention network for hyperspectral image classification
  • Oct 3, 2024
  • International Journal of Remote Sensing
  • Hongxi Wang + 3 more

Recently, the convolutional neural network (CNN) has gained widespread adoption in the hyperspectral image (HSI) classification owing to its remarkable feature extraction capability. However, the fixed acceptance domain of CNN restricts it to Euclidean image data only, making it difficult to capture complex information in hyperspectral data. To overcome this problem, much attention has been paid to the graph attention network (GAT), which can effectively model graph structure and capture complex dependencies between nodes. However, GAT usually acts on superpixel nodes, which may lead to the loss of pixel-level information. To better integrate the advantages of both, we propose a CNN-assisted multi-hop graph attention network (CMGAT) for HSI classification. Specifically, a parallel dual-branch architecture is first constructed to simultaneously capture spectral-spatial features from hyperspectral data at the superpixel and pixel levels using GAT and CNN, respectively. On this basis, the multi-hop and multi-scale mechanisms are further employed to construct a multi-hop GAT module and a multi-scale CNN module to capture diverse feature information. Secondly, an attention module is cascaded before the multi-scale CNN module to improve classification performance. Eventually, the output information from the two branches is weighted and fused to produce the classification result. We performed experiments on four benchmark HSI datasets, including Indian Pines (IP), University of Pavia (UP), Salinas Valley (SV) and WHU-Hi-LongKou (LK). The results demonstrate that the proposed method outperforms several deep learning methods, achieving overall accuracies of 95.67%, 99.04%, 99.55% and 99.51%, respectively, even with fewer training samples.

  • Research Article
  • Cite Count Icon 172
  • 10.1109/tgrs.2019.2910603
Automatic Design of Convolutional Neural Network for Hyperspectral Image Classification
  • Sep 1, 2019
  • IEEE Transactions on Geoscience and Remote Sensing
  • Yushi Chen + 5 more

Hyperspectral image (HSI) classification is a core task in the remote sensing community, and recently, deep learning-based methods have shown their capability of accurate classification of HSIs. Among the deep learning-based methods, deep convolutional neural networks (CNNs) have been widely used for the HSI classification. In order to obtain a good classification performance, substantial efforts are required to design a proper deep learning architecture. Furthermore, the manually designed architecture may not fit a specific data set very well. In this paper, the idea of automatic CNN for the HSI classification is proposed for the first time. First, a number of operations, including convolution, pooling, identity, and batch normalization, are selected. Then, a gradient descent-based search algorithm is used to effectively find the optimal deep architecture that is evaluated on the validation data set. After that, the best CNN architecture is selected as the model for the HSI classification. Specifically, the automatic 1-D Auto-CNN and 3-D Auto-CNN are used as spectral and spectral–spatial HSI classifiers, respectively. Furthermore, the cutout is introduced as a regularization technique for the HSI spectral–spatial classification to further improve the classification accuracy. The experiments on four widely used hyperspectral data sets (i.e., Salinas, Pavia University, Kennedy Space Center, and Indiana Pines) show that the automatically designed data-dependent CNNs obtain competitive classification accuracy compared with the state-of-the-art methods. In addition, the automatic design of the deep learning architecture opens a new window for future research, showing the huge potential of using neural architectures’ optimization capabilities for the accurate HSI classification.

  • Conference Article
  • 10.1109/icicsp55539.2022.10050698
Lightweight Multilevel Feature Fusion Network for Hyperspectral Image Classification
  • Nov 26, 2022
  • Quanyu Huang + 3 more

Hyperspectral image (HSI) classification is the key technology of remote sensing image processing. In recent years, convolutional neural network (CNN), which is a powerful feature extractor, has been introduced into the field of HSI classification. Since the features of HSI are the basis of HSI classification, how to effectively extract the spectral-spatial features from HSI with CNN has become a research hotspot. The HSI feature extraction network, based on two-dimensional (2D) and three-dimensional (3D) CNN which can extract both spectral and spatial information, may lead to the increase of parameters and computational cost. Compared with 2D CNN and 3D CNN, the number of parameters and computational cost of one-dimensional (1D) CNN will be greatly reduced. However, 1D CNN based algorithms can only extract the spectral information without considering the spatial information. Therefore, in this paper, a lightweight multilevel feature fusion network (LMFFN) is proposed for HSI classification, which aims to achieve efficient extraction of spectral-spatial features and to minimize the number of parameters. The main contributions of this paper are divided into the following two points: First, we design a hybrid spectral-spatial feature extraction network (HSSFEN) to combine the advantages of 1D, 2D and 3D CNN. It introduces the idea of depthwise separable convolution method, which effectively reduces the complexity of the proposed HSSFEN. Then, a multilevel spectral-spatial feature fusion network (MSSFFN) is proposed to further obtain more effective spectral-spatial features, which effectively fuses the bottom spectral-spatial features and the top spectral-spatial features. To demonstrate the performance of our proposed method, a series of experiments are conducted on three HSI datasets, including Indian Pine, University of Pavia, and Salinas Scene datasets. The experimental results indicate that our proposed LMFFN is able to achieve better performance than the manual feature extraction methods and deep learning methods, which demonstrates the superiority of our proposed method.

  • Research Article
  • Cite Count Icon 10
  • 10.1109/lgrs.2023.3241720
2SRS: Two-Stream Residual Separable Convolution Neural Network for Hyperspectral Image Classification
  • Jan 1, 2023
  • IEEE Geoscience and Remote Sensing Letters
  • Zharfan Zahisham + 4 more

Typically, hyperspectral image suffers from redundant information, data scarcity, and class imbalance problems. This letter proposes a hyperspectral image classification framework named as Two-Stream Residual Separable Convolution (2SRS) network that aims to mitigate these problems. Principal Component Analysis (PCA) is first employed to reduce the spectral dimension of the hyperspectral image. Subsequently, the data scarcity and class imbalance problems are overcome via spatial and spectral data augmentations. A novel spectral data creation from image patches is proposed. The augmented samples are fed into the proposed 2SRS network for hyperspectral image classification. We evaluated the proposed method on three benchmark datasets, namely (1) Indian Pines, (2) Pavia University, and (3) Salinas Scene. The proposed method achieved state-of-the-art performance in terms of Overall Accuracy (OA), Average Accuracy (AA), and Kappa Coefficient (Kappa) for both 30% and 10% training set ratio.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.neucom.2024.128271
Global-local manifold embedding broad graph convolutional network for hyperspectral image classification
  • Jul 26, 2024
  • Neurocomputing
  • Heling Cao + 5 more

Global-local manifold embedding broad graph convolutional network for hyperspectral image classification

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.