Hyperspectral Image Classification Based on Two-Branch Multiscale Spatial Spectral Feature Fusion with Self-Attention Mechanisms
In recent years, the use of deep neural network in effective network feature extraction and the design of efficient and high-precision hyperspectral image classification algorithms has gradually become a research hotspot for scholars. However, due to the difficulty of obtaining hyperspectral images and the high cost of annotation, the training samples are very limited. In order to cope with the small sample problem, researchers often deepen the network model and use the attention mechanism to extract features; however, as the network model continues to deepen, the gradient disappears, the feature extraction ability is insufficient, and the computational cost is high. Therefore, how to make full use of the spectral and spatial information in limited samples has gradually become a difficult problem. In order to cope with such problems, this paper proposes two-branch multiscale spatial–spectral feature aggregation with a self-attention mechanism for a hyperspectral image classification model (FHDANet); the model constructs a dense two-branch pyramid structure, which can achieve the high efficiency extraction of joint spatial–spectral feature information and spectral feature information, reduce feature loss to a large extent, and strengthen the model’s ability to extract contextual information. A channel–space attention module, ECBAM, is proposed, which greatly improves the extraction ability of the model for salient features, and a spatial information extraction module based on the deep feature fusion strategy HLDFF is proposed, which fully strengthens feature reusability and mitigates the feature loss problem brought about by the deepening of the model. Compared with five hyperspectral image classification algorithms, SVM, SSRN, A2S2K-ResNet, HyBridSN, SSDGL, RSSGL and LANet, this method significantly improves the classification performance on four representative datasets. Experiments have demonstrated that FHDANet can better extract and utilise the spatial and spectral information in hyperspectral images with excellent classification performance under small sample conditions.
- # Spectral Information
- # Spectral Information In Hyperspectral Images
- # Spatial Information
- # Hyperspectral Image Classification
- # Information In Hyperspectral Images
- # Spectral Feature Information
- # Hyperspectral Image
- # Small Sample Conditions
- # Excellent Classification Performance
- # Feature Extraction Ability
- Conference Article
1
- 10.1145/3641584.3641609
- Sep 22, 2023
With the continuous innovation in deep learning, it has become a major direction for scholars to introduce the knowledge of deep learning into hyperspectral image classification to enhance its classification accuracy. Convolutional Neural Networks (CNN) are one of the most commonly used deep learning-based visual data processing methods, and are widely used in hyperspectral image (HSI) classification by virtue of their excellent contextual modeling capability. Since the performance of HSI classification is highly dependent on spatial and spectral information, this paper proposes a hyperspectral image classification method using 3D attention mechanism in collaboration with Transformer for hyperspectral image classification in view of the problems that the current hyperspectral image classification models with the framework of CNN have insufficient spatial spectral feature extraction and fail to excavate and represent the sequence properties of spectral features well. In this paper, we introduce a variant Transformer model based on a hybrid model of both improved 3D-CNN and 2D-CNN, combining complementary information of spatial spectrum and spectra in the form of 3D convolution and 2D convolution on CNN, and adding a variant attention mechanism module to strengthen spatial texture features, while combining grouped transfer Transformer to jump connection to enable the lower layer to better learn the upper layer features. Firstly, a variant channel attention mechanism is introduced on 3D-CNN to enhance the acquisition of spectral information of image features by 3D-CNN. Secondly, a variant spatial attention mechanism is introduced to enable 3D-CNN to better acquire the spatial information of hyperspectral images in the network, and subsequently the acquired spatial and spectral feature information is passed to 2D-CNN to enable it to better acquire local feature information. Finally, the acquired image feature information is passed to the variant Transformer model to make up for the fact that CNN can only acquire hyperspectral image features in local contexts, enabling it to better acquire global feature information on feature sequences. The experimental results show that the proposed model is experimented on two hyperspectral datasets, Indian Pines and Pavia University, and the overall classification accuracy (OA), average classification accuracy (AA), and Kappa coefficient reach up to 99.59%, 99.31%, and 99.45%, respectively, on the PU dataset, compared with the current cutting-edge techniques. The classification accuracy has been improved.
- Research Article
14
- 10.3390/electronics14040797
- Feb 18, 2025
- Electronics
In contrast to conventional remote sensing images, hyperspectral remote sensing images are characterized by a greater number of spectral bands and exceptionally high resolution. The richness of both spectral and spatial information facilitates the precise classification of various objects within the images, establishing hyperspectral imaging as indispensable for remote sensing applications. However, the labor-intensive and time-consuming process of labeling hyperspectral images results in limited labeled samples, while challenges like spectral similarity between different objects and spectral variation within the same object further complicate the development of classification algorithms. Therefore, efficiently exploiting the spatial and spectral information in hyperspectral images is crucial for accomplishing the classification task. To address these challenges, this paper presents a multi-scale feature fusion convolutional neural network (MSFF). The network introduces a dual branch spectral and spatial feature extraction module utilizing 3D depthwise separable convolution for joint spectral and spatial feature extraction, further refined by an attention-based-on-central-pixels (ACP) mechanism. Additionally, a spectral–spatial joint attention module (SSJA) is designed to interactively explore latent dependency between spectral and spatial information through the use of multilayer perceptron and global pooling operations. Finally, a feature fusion module (FF) and an adaptive multi-scale feature extraction module (AMSFE) are incorporated to enable adaptive feature fusion and comprehensive mining of feature information. Experimental results demonstrate that the proposed method performs exceptionally well on the IP, PU, and YRE datasets, delivering superior classification results compared to other methods and underscoring the potential and advantages of MSFF in hyperspectral remote sensing classification.
- Conference Article
1
- 10.1109/icsp.2018.8652468
- Aug 1, 2018
Hyperspectral images include richer spectral and spatial information than common images, which are widely used in military, agricultural fields, etc. With the development of sensor technology, the spatial resolution and spectral resolution of hyperspectral images have been improved significantly. However, the disadvantage that there may contain only one part of one object which has different spectral information in hyperspectral images. This will lead to unsatisfactory performance in traditional pixel-level hyperspectral image classification. Thus, a new hyperspectral image classification framework based on convolutional neural network is proposed. First, band selection is adopted to obtain multiple sets of false color images for small sample hyperspectral data. Then, parallel CNNs are introduced to get the classification results of different band combinations. Finally, statistical analysis strategy is performed to obtain the final output result. Experiments show that the classification accuracy of this method is better than that of the previous algorithm on the same dataset.
- Research Article
12
- 10.15201/hungeobull.66.2.4
- Jul 4, 2017
- Hungarian Geographical Bulletin
High spatial and spectral resolution aerial images make it possible to develop detailed and large-scale (about 1:5,000) urban land cover maps. The main objectives of this study are (1) to evaluate the correlation between laboratory and hyperspectral image spectra to select proper bands and training samples for classification; (2) to develop a classification process to combine the spectral and spatial information of multispectral and hyperspectral images and make an urban land cover map for the study area in Szeged, Hungary; and (3) to examine the effect of different roof types on the modification of surface temperature. Reference materials were collected from the training area and their spectral characteristics were measured by a laboratory spectrometer. The hyperspectral image and laboratory spectral data between 500-800 nm showed a very strong correlation, the correlation coefficient was 0.99. The urban land cover map was produced by the combination of segmentation procedure and Spectral Angle Mapper (SAM) method using the spatial information derived from multispectral image and the spectral information of the hyperspectral image. Eight land cover classes were identified as impervious surfaces (asphalt, 4 types of tiled roof), water, and green vegetation. The overall accuracy of urban land cover map was 87.9 per cent. According to the results, an accurate large-scale urban land cover map can be generated from the fusion of multispectral and hyperspectral images. We presented that certain roof types have significant effect on surface temperature, which is strongly connected to the urban heat island phenomenon, and influences population health.
- Research Article
8
- 10.1080/01431161.2024.2370501
- Jul 5, 2024
- International Journal of Remote Sensing
Graph neural networks (GNNs) have recently garnered significant attention due to their exceptional performance across various applications, including hyperspectral (HS) image classification. However, most existing GNN-based models for HS image classification are limited depth models and often suffer from performance degradation as model depth increases. This study introduces HyperGCN, an exclusive GNN-based model designed with multiple graph convolutional layers to exploit the rich spectral information inherent in HS images, thereby enhancing classification performance. To address performance degradation, HyperGCN incorporates techniques resistant to oversmoothing into its architecture. Additionally, multiple-side exit branches are integrated into the intermediate layers of HyperGCN, enabling dynamic management of the complexity of HS images. Less complex HS images are processed by fewer layers, exiting early via attached branches, while more complex images traverse multiple layers until reaching the final output layer. Extensive experiments on four benchmark HS datasets (Indian Pines, Pavia University, Salinas, and Botswana) demonstrate HyperGCN’s superior performance over basic GNN-based models. Notably, HyperGCN outperforms or performs comparably to the CNN-GNN combined model in classifying HS images. Furthermore, the superior performance of multi-exit HyperGCN over its single-exit counterpart emphasizes the effectiveness of incorporating side exit branches in GNN-based HS image classification. Compared to state-of-the-art models, multi-exit HyperGCN demonstrates competitive performance, highlighting its effectiveness in handling complex spectral information in HS images while maintaining an acceptable balance between accuracy and computational efficiency.
- Research Article
3
- 10.1109/tgrs.2024.3508737
- Jan 1, 2025
- IEEE Transactions on Geoscience and Remote Sensing
The rich spatial and spectral information in hyperspectral images (HSIs) makes spectral-spatial relationships essential for HSI classification (HSIC). Recent advancements indicate convolutional neural networks (CNNs) excel in HSIC but often struggle with precise spectral feature extraction. Moreover, the abundance of spectral information presents challenges in efficient feature representation and minimizing cross-domain interference. To address these limitations, we propose an efficient sequential spectral-spatial feature convolution network (S3FCN), employing successive subnetworks for spectral and spatial feature extraction with depthwise separable convolution. This approach balances the preservation of deep spectral and spatial features while significantly reducing network parameters, enhancing both performance and computational efficiency. We also introduce a sequential spectral-spatial attention module (S3AM) to integrate cross-domain correlations. This module utilizes spectral features from the preceding subnetwork and multilevel residual layers for in-depth exploration of spatial features, enabling deep integration for improved classification performance. The proposed architecture’s effectiveness is verified on five benchmark HSI datasets, including Pavia University, Salinas Valley, Kennedy Space Center, Indian Pines, and Houston 2013. Experimental results demonstrate that the sequential spectral-spatial connection in the feature extraction and attention mechanism integrated with depthwise separable convolution collectively surpasses current state-of-the-art (SOTA) techniques in classification accuracy with overall accuracies of 98.28%, 97.63%, 99.31%, 96.72%, and 95.38% across different datasets, while limiting the computation overhead, ensuring balanced network efficiency.
- Research Article
48
- 10.1109/tgrs.2023.3258488
- Jan 1, 2023
- IEEE Transactions on Geoscience and Remote Sensing
Hyperspectral image (HSI) classification aims to distinguish the category of a land coverage object for each pixel. In an effective way, the transformer architecture has been successfully introduced for the HSI classification task with promising performance. However, existing transformer-based HSI classification methods still suffer from the inability to fully explore both spectral information and spatial information in HSIs. To this end, we propose a Spectral-Spatial Token Enhanced Transformer (SSTE-Former) method with the hash-based positional embedding, which is the first to exploit multiscale spectral-spatial information for transformer-based HSI classification in-depth. Specifically, SSTE-Former accepts multiscale HSI cubes centered on the target pixel, that are preprocessed by PCA. Then, a designed multiscale CNN architecture is utilized to extract short-range spectral-spatial features and generate token embeddings. In parallel, a novel hash-based spatially enhanced positional embedding tailored for HSI cubes is developed to model the correlations within and across multiscale token embeddings. Finally, multiscale token embeddings and hash-based positional embeddings are concatenated and flattened into the transformer encoder for long-range spectral-spatial feature fusion. We conduct extensive experiments on four benchmark HSI datasets and achieve superior performance compared with the state-of-the-art HSI classification methods.
- Research Article
21
- 10.1007/s12517-020-05487-4
- Jun 1, 2020
- Arabian Journal of Geosciences
Creating accurate land use and land cover maps using remote sensing images is one of the most important applications of remotely sensed data. Abundant spectral information in hyperspectral images (HSI) makes it possible to distinguish materials that would not be distinguishable by multi-spectral sensors. Spectral and spatial information from HSI is of primary importance for image classification. In this study, a hybrid stacked autoencoder (SAE) architecture and support vector machine (SVM) classifier was constructed to classify the HSI. The SAE architecture is constituted by stacking a multiple autoencoder (AE) deep learning network that consists in the encoder and decoder process. Spatial features in a neighbor region extracted from the principal component analysis (PCA) and the texture feature extracted from the gray-level cooccurrence matrix (GLCM) were fed into the classifier. It was found that the best result was from the combination of GLCM texture feature, PCA spatial feature, and spectral feature. Meanwhile, the representative features derived from SAE deep learning network were better than the original features. It reminded us that extracting the representative features from hyperspectral images is a key step of improving classification accuracy.
- Research Article
2
- 10.3390/s25041158
- Feb 13, 2025
- Sensors (Basel, Switzerland)
Change detection, as a popular research direction for dynamic monitoring of land cover change, usually uses hyperspectral remote-sensing images as data sources. Hyperspectral images have rich spatial–spectral information, but traditional change detection methods have limited ability to express the features of hyperspectral images, and it is difficult to identify the complex detailed features, semantic features, and spatial–temporal correlation features in two-phase hyperspectral images. Effectively using the abundant spatial and spectral information in hyperspectral images to complete change detection is a challenging task. This paper proposes a hyperspectral image change detection method based on the balanced metric, which uses the spatiotemporal attention module to translate bi-temporal hyperspectral images to the same eigenspace, uses the deep Siamese network structure to extract deep semantic features and shallow spatial features, and measures sample features according to the Euclidean distance. In the training phase, the model is optimized by minimizing the loss of distance maps and label maps. In the testing phase, the prediction map is generated by simple thresholding of distance maps. Experiments show that on the four datasets, the proposed method can achieve a good change detection effect.
- Conference Article
1
- 10.1109/icicsp55539.2022.10050698
- Nov 26, 2022
Hyperspectral image (HSI) classification is the key technology of remote sensing image processing. In recent years, convolutional neural network (CNN), which is a powerful feature extractor, has been introduced into the field of HSI classification. Since the features of HSI are the basis of HSI classification, how to effectively extract the spectral-spatial features from HSI with CNN has become a research hotspot. The HSI feature extraction network, based on two-dimensional (2D) and three-dimensional (3D) CNN which can extract both spectral and spatial information, may lead to the increase of parameters and computational cost. Compared with 2D CNN and 3D CNN, the number of parameters and computational cost of one-dimensional (1D) CNN will be greatly reduced. However, 1D CNN based algorithms can only extract the spectral information without considering the spatial information. Therefore, in this paper, a lightweight multilevel feature fusion network (LMFFN) is proposed for HSI classification, which aims to achieve efficient extraction of spectral-spatial features and to minimize the number of parameters. The main contributions of this paper are divided into the following two points: First, we design a hybrid spectral-spatial feature extraction network (HSSFEN) to combine the advantages of 1D, 2D and 3D CNN. It introduces the idea of depthwise separable convolution method, which effectively reduces the complexity of the proposed HSSFEN. Then, a multilevel spectral-spatial feature fusion network (MSSFFN) is proposed to further obtain more effective spectral-spatial features, which effectively fuses the bottom spectral-spatial features and the top spectral-spatial features. To demonstrate the performance of our proposed method, a series of experiments are conducted on three HSI datasets, including Indian Pine, University of Pavia, and Salinas Scene datasets. The experimental results indicate that our proposed LMFFN is able to achieve better performance than the manual feature extraction methods and deep learning methods, which demonstrates the superiority of our proposed method.
- Research Article
11
- 10.1109/tgrs.2022.3161139
- Jan 1, 2022
- IEEE Transactions on Geoscience and Remote Sensing
Representation learning methods, such as sparse representation (SR) and collaborative representation (CR), have been widely used in hyperspectral image classification. However, they merely considered the similarities between features. Due to the plentiful spatial and spectral information in hyperspectral images, the differences between features also need to be considered. Relaxed CR (RCR) is used in face recognition to accommodate the difference and similarity of features simultaneously. In this article, a novel method of RCR with band weighting based on superpixel segmentation is proposed for hyperspectral image classification. The <inline-formula> <tex-math notation="LaTeX">$\boldsymbol {l}_{ \boldsymbol {2}}$ </tex-math></inline-formula> norm on band coefficients and global average coefficients is exploited to ensure the similarity, and the variance determines the specific coefficient-related weight of each band. The training set is selected from each superpixel, which is considered as a subgraph rather than independent pixels. It is favorable for concentrating on the difference between similar bands since the samples in each superpixel are of high similarity. Furthermore, extended multiattribute profile (EMAP) features, Gabor features, and local binary pattern (LBP) features are employed to increase the diversity of features; thus, a method of multifeatures’ RCR based on superpixels is proposed. Three typical data are used to validate the related algorithms. The experiments demonstrate that the proposed algorithms can effectively improve classification accuracy compared to state-of-the-art classifiers.
- Research Article
47
- 10.3390/rs15122990
- Jun 8, 2023
- Remote Sensing
In recent years, hyperspectral image classification techniques have attracted a lot of attention from many scholars because they can be used to model the development of different cities and provide a reference for urban planning and construction. However, due to the difficulty in obtaining hyperspectral images, only a limited number of pixels can be used as training samples. Therefore, how to adequately extract and utilize the spatial and spectral information of hyperspectral images with limited training samples has become a difficult problem. To address this issue, we propose a hyperspectral image classification method based on dense pyramidal convolution and multi-feature fusion (DPCMF). In this approach, two branches are designed to extract spatial and spectral features, respectively. In the spatial branch, dense pyramid convolutions and non-local blocks are used to extract multi-scale local and global spatial features in image samples, which are then fused to obtain spatial features. In the spectral branch, dense pyramidal convolution layers are used to extract spectral features in image samples. Finally, the spatial and spectral features are fused and fed into fully connected layers to obtain classification results. The experimental results show that the overall accuracy (OA) of the method proposed in this paper is 96.74%, 98.10%, 98.92% and 96.67% on the four hyperspectral datasets, respectively. Significant improvements are achieved compared to the five methods of SVM, SSRN, FDSSC, DBMA and DBDA for hyperspectral classification. Therefore, the proposed method can better extract and exploit the spatial and spectral information in image samples when the number of training samples is limited. Provide more realistic and intuitive terrain and environmental conditions for urban planning, design, construction and management.
- Research Article
17
- 10.1016/j.compmedimag.2024.102339
- Jan 19, 2024
- Computerized Medical Imaging and Graphics
CrossU-Net: Dual-modality cross-attention U-Net for segmentation of precancerous lesions in gastric cancer
- Research Article
10
- 10.1016/j.patrec.2023.12.023
- Jan 5, 2024
- Pattern Recognition Letters
SemanticFormer: Hyperspectral image classification via semantic transformer
- Conference Article
7
- 10.1109/fskd.2017.8393336
- Jul 1, 2017
Hyperspectral Image (HSI) classification is one of the most persistent issue in remote sensing field. Recently, deep learning has attracted attention in HSI Classification field due to its accuracy and stronger generalization. This paper proposes a new spectral-spatial HSI classification approach developed on the deep learning concept of stacked-auto-encoders (SAE) based deep feature extraction and hidden Markov random field based segmentation. Specifically, First the SAE model is implemented as a spectral information-based classifier to extract the deep spectral features. Second, spatial information is obtained by using effective Hidden Markov random field (HMRF) based segmentation technique. Finally, maximum voting based criteria is employed to merge the extracted spectral and spatial information, which results in the precise spectral-spatial HSI classification. The characterization of the HSI with spectral spatial features results into more comprehensive analysis of HSI and to a more accurate classification. In general, use of spectral information resulted from the SAE process and spatial information by means of HMRF based segmentation and merging of spectral and spatial information by means of maximum voting based criteria, has a significant effect on the accuracy of the HSI classification. Experiments on real diverse hyperspectral data sets with different contexts and resolutions acquired by AVIRIS and ROSIS sensors show the accuracy of the proposed method and confirms that results of the proposed classification approach are comparable to several recently proposed HSI classification techniques.