Spectral-Spatial Transformer for Hyperspectral Image Sharpening.
Convolutional neural networks (CNNs) have recently achieved outstanding performance for hyperspectral (HS) and multispectral (MS) image fusion. However, CNNs cannot explore the long-range dependence for HS and MS image fusion because of their local receptive fields. To overcome this limitation, a transformer is proposed to leverage the long-range dependence from the network inputs. Because of the ability of long-range modeling, the transformer overcomes the sole CNN on many tasks, whereas its use for HS and MS image fusion is still unexplored. In this article, we propose a spectral-spatial transformer (SST) to show the potentiality of transformers for HS and MS image fusion. We devise first two branches to extract spectral and spatial features in the HS and MS images by SST blocks, which can explore the spectral and spatial long-range dependence, respectively. Afterward, spectral and spatial features are fused feeding the result back to spectral and spatial branches for information interaction. Finally, the high-resolution (HR) HS image is reconstructed by dense links from all the fused features to make full use of them. The experimental analysis demonstrates the high performance of the proposed approach compared with some state-of-the-art (SOTA) methods.
- Conference Article
1
- 10.1109/igarss46834.2022.9884194
- Jul 17, 2022
Convolutional neural networks (CNNs) have achieved impressive performance for hyperspectral (HS) and multispectral (MS) image fusion in recent years. They extract features by local filters, which is limited to explore long-range dependency in input images. However, long-range dependence is an import cue for HS and MS image fusion, as it contributes to exploration of spatial self-similarity and spectral dependence. To take advantage of long-range dependence, we propose a spectral-spatial transformer (SST) for MS and HS image fusion. The experimental results demonstrate the high performance of the proposed approach compared to some state-of-the-art methods.
- Research Article
312
- 10.1109/tnnls.2020.2980398
- Mar 1, 2021
- IEEE Transactions on Neural Networks and Learning Systems
Hyperspectral image (HSI) and multispectral image (MSI) fusion, which fuses a low-spatial-resolution HSI (LR-HSI) with a higher resolution multispectral image (MSI), has become a common scheme to obtain high-resolution HSI (HR-HSI). This article presents a novel HSI and MSI fusion method (called as CNN-Fus), which is based on the subspace representation and convolutional neural network (CNN) denoiser, i.e., a well-trained CNN for gray image denoising. Our method only needs to train the CNN on the more accessible gray images and can be directly used for any HSI and MSI data sets without retraining. First, to exploit the high correlations among the spectral bands, we approximate the desired HR-HSI with the low-dimensional subspace multiplied by the coefficients, which can not only speed up the algorithm but also lead to more accurate recovery. Since the spectral information mainly exists in the LR-HSI, we learn the subspace from it via singular value decomposition. Due to the powerful learning performance and high speed of CNN, we use the well-trained CNN for gray image denoising to regularize the estimation of coefficients. Specifically, we plug the CNN denoiser into the alternating direction method of multipliers (ADMM) algorithm to estimate the coefficients. Experiments demonstrate that our method has superior performance over the state-of-the-art fusion methods.
- Research Article
95
- 10.1016/j.inffus.2023.102148
- Nov 19, 2023
- Information Fusion
Reciprocal transformer for hyperspectral and multispectral image fusion
- Research Article
8
- 10.3390/app11010288
- Dec 30, 2020
- Applied Sciences
In this paper, a detail-injection method based on a coupled convolutional neural network (CNN) is proposed for hyperspectral (HS) and multispectral (MS) image fusion with the goal of enhancing the spatial resolution of HS images. Owing to the excellent performance in spectral fidelity of the detail-injection model and the image spatial–spectral feature exploration ability of CNN, the proposed method utilizes a couple of CNN networks as the feature extraction method and learns details from the HS and MS images individually. By appending an additional convolutional layer, both the extracted features of two images are concatenated to predict the missing details of the anticipated HS image. Experiments on simulated and real HS and MS data show that compared with some state-of-the-art HS and MS image fusion methods, our proposed method achieves better fusion results, provides excellent spectrum preservation ability, and is easy to implement.
- Research Article
31
- 10.1109/tgrs.2022.3208125
- Jan 1, 2022
- IEEE Transactions on Geoscience and Remote Sensing
The application of hyperspectral image (HSI) is more and more extensive, but the lower spatial resolution seriously affects its application effect. Using low-resolution hyperspectral image (LR-HSI) and high-resolution multispectral image (MSI) fusion technology to achieve super-resolution reconstruction of HSI has become a mainstream method. However, most of the existing fusion methods do not make full use of the large-scale range of remote sensing images, and neglect the preservation of spatial-spectral information in the fusion process. Considering that the spectral information in fused high-resolution hyperspectral image (HR-HSI) mainly depends on HSI, and the spatial information mainly depends on MSI, this paper proposes a full-scale linked Unet with spatial-spectral joint perceptual attention for hyperspectral and multispectral image fusion (FSL-Unet). The FSL-Unet consists of two modules, the first is spatial-spectral attention extraction module (SSAE), which is used to calculate the spectral attention of LR-HSI and the spatial attention of HR-MSI at different scales. The second is the full-scale link U-shaped fusion module (FLUF), which adopts a multi-level feature extraction strategy, using denser full-scale skip connections to explore feature information in a finer-grained range, enabling flexible combination of multi-scale and multi-path features. At the same time, we propose spatial-spectral joint peceptual attention (SSJPA) on the encoder side of FLUF. SSJPA can make full use of the attention maps computed by the SSAE, and then effectively embed spatial and spectral information into the fused image, enabling uninterrupted information transfer and aggregation. To demonstrate the effectiveness of FSL-Unet, we selected five public hyperspectral datasets for experiments. Compared with other eight state-of-the-art fusion methods, the experimental results show that the FSL-Unet achieves competitive results. The source code for FSL-Unet can be downloaded from https://github.com/wxy11-27/FSL-Unet.
- Research Article
31
- 10.1109/lgrs.2022.3229692
- Jan 1, 2023
- IEEE Geoscience and Remote Sensing Letters
The key to hyperspectral image (HSI) and multispectral image (MSI) fusion is to take advantage of the properties of interspectra self-similarities of HSIs and spatial correlations of MSIs. However, leading convolutional neural network (CNN)-based methods show shortcomings in capturing long-range dependencies and self-similarity prior. To this end, we propose a simple yet efficient Transformer-based network, hyperspectral and multispectral image fusion (HMF)-Former, for the HSI/MSI fusion. The HMF-Former adopts a U-shaped architecture with a spatio-spectral Transformer block (SSTB) as the basic unit. In the SSTB, embedded spatial-wise multihead self-attention (Spa-MSA) and spectral-wise multihead self-attention (Spe-MSA) effectively capture interactions of spatial regions and interspectra dependencies, respectively. They are consistent with the properties of spatial correlations of MSIs and interspectra self-similarities of HSIs. In addition, specially designed SSTB enables the HMF-Former to capture both local and global features while maintaining linear complexity. Extensive experiments on four benchmark datasets show that our method significantly outperforms state-of-the-art methods.
- Research Article
5
- 10.1109/tgrs.2022.3225577
- Jan 1, 2022
- IEEE Transactions on Geoscience and Remote Sensing
Recently, deep convolutional neural network based hyperspectral and multispectral image fusion methods have shown significant performance. Nevertheless, the rich spatial and spectral details of hyperspectral images (HSIs) have not been fully explored, leaving room for further improve the representation ability of the model. In this paper, we propose an efficient cross-modality self-calibrated network (CMSCN) for hyperspectral and multispectral image fusion. Specifically, we use a cross-modality non-local module to fuse a high-resolution multispectral image (HR-MSI) and a low-resolution hyperspectral image (LR-HSI) to get an enhanced LR-HSI. In addition, a novel cross-scale self-calibrated convolution structure is proposed to explore and exploit multi-scale and hierarchical spatial-spectral features, which can improve the learning ability of the model. The introduced efficient spatial-spectral attention mechanism can calibrate the feature representation at different dimensions, thereby providing more efficient and accurate information for hyperspectral image reconstruction. Extensive experimental results on various hyperspectral images demonstrate the superiority of our method in comparison with the state-of-the-art image fusion methods.
- Research Article
243
- 10.1016/j.inffus.2020.11.001
- Nov 13, 2020
- Information Fusion
Recent advances and new guidelines on hyperspectral and multispectral image fusion
- Conference Article
2
- 10.1109/igarss47720.2021.9553692
- Jul 11, 2021
Hyperspectral image (HSI) provides rich spectral information, which has been used in object detection, environmental protection. However, HSI suffers from low spatial resolution owing to the limitations of imaging systems. Hyperspectral and multispectral image (MSI) fusion is an efficient way to enhance the spatial resolution of HSI. In past decades, many HSI and MSI fusion algorithms have been presented in the literature. In this paper, we present a comprehensive review for the HSI-MSI fusion methods. According to the characteristics and trend of HSI-MSI fusion methods, they are categorized as two classes: model-driven approaches and data-driven approaches. We clarify their characteristics, advantages and also make a comparison and discussion for the fusion methods in each category. Additionally, we analyze the existing challenges and present the potential research directions for HSI - MSI fusion method.
- Research Article
7
- 10.1080/01431161.2022.2109223
- Jun 3, 2022
- International Journal of Remote Sensing
The coarse spatial resolutions of hyperspectral (HS) satellite images limit their use in many applications. The spatial structure quality of HS images can be improved by fusing them either with higher-resolution panchromatic (PAN) images, or with higher-resolution multispectral (MS) images. Fusion of HS images can be done with fusion methods that are designed to fuse MS and PAN images, and the fusion methods developed for the fusion of HS and MS images. A wide variety of HS-MS and MS-PAN image fusion techniques can be used for the fusion of HS images, which leads the users to a hesitation as to which method(s) should be used for optimal fusion performance. Hence, the current study aimed to qualitatively and quantitatively assess the HS image fusion performances of a total of 15 MS-PAN image fusion methods and 17 state-of-the-art HS-MS image fusion techniques within four experiments, with the hope to give some clues on the performances of the fusion techniques used. Experiments showed that the HS-MS fusion methods exhibited much better HS image fusion performance, compared to the MS-PAN fusion methods used. It was also concluded that the coupled nonnegative matrix factorization (CNMF), convolutional neural network (CNN) denoiser-based method (CNN-D), HS super-resolution (HySure) and fast fusion based on Sylvester equation with naive Gaussian prior (FUSE-G) techniques provided the most robust fusion results.
- Research Article
21
- 10.1109/tgrs.2022.3204769
- Jan 1, 2022
- IEEE Transactions on Geoscience and Remote Sensing
Fusion of hyperspectral images with low-spatial and high-spectral resolution and multispectral images with high-spatial and low-spectral resolution is an important method to improve spatial resolution. Existing deep learning-based image fusion technologies usually neglect the ability of neural networks to understand differential features. In addition, the loss constraints do not stem from the physical characteristics of the hyperspectral imaging sensors. We propose the self-supervised loss and the spatially and spectrally separable loss, respectively. 1) The self-supervised loss: Different from the previous way of directly stacking the upsampled hyperspectral images and multispectral images as input, we expect the potentially processed hyperspectral images to ensure not only the integrity of hyperspectral image information, but also the most reasonable balance between overall spatial and spectral features. Firstly, the pre-interpolated hyperspectral images are decomposed into subspaces as self-supervised labels. Then, a network is designed to learn subspace information and obtain the most discriminative features. 2) The separable loss: According to the physical characteristics of hyperspectral images, the pixel-based mean square error loss is first divided into the domain loss and spectral domain loss, and then the similarity score of the images is calculated and used to construct the weighting coefficients of the two domain losses. At last, the separable loss is jointly expressed by the weights. Experiments on public benchmark datasets indicate that the self-supervised loss and separable loss can improve fusion performance.
- Conference Article
5
- 10.1109/igarss39084.2020.9323227
- Sep 26, 2020
Hyperspectral (HS) and multispectral (MS) image fusion is an important task to construct an HS image with high spatial and spectral resolutions. In this paper, we present a novel HS and MS fusion method using non-convex low rank tensor approximation and total variation regularization. In specific, the Laplace based low-rank model is formed to exploit spatial-spectral correlation and nonlocal similarity of the HS image, and the second-order total variation is used to describe the local smoothness structure in the spatial domain and adjacent bands. Also, an effective optimization algorithm is designed for the proposed model. In the experiments, we demonstrate the superiority of the proposed method compared to several state-of-the-art approaches.
- Research Article
99
- 10.1016/j.knosys.2023.110362
- Feb 3, 2023
- Knowledge-Based Systems
MCT-Net: Multi-hierarchical cross transformer for hyperspectral and multispectral image fusion
- Book Chapter
- 10.1007/978-3-031-02444-3_2
- Jan 1, 2022
Hyperspectral and multispectral image (HS-MSI) fusion aims to generate a high spatial resolution hyperspectral image (HR-HSI), using the complementarity and redundancy of the low spatial resolution hyperspectral image (LR-HSI) and the high spatial resolution multispectral image (HS-MSI). Previous works usually assume that the spatial down-sampling operator between HR-HSI and LR-HSI, and the spectral response function between HR-HSI and HR-MSI are known, which is infeasible in many cases. In this paper, we propose a coarse-to-fine HS-MSI fusion network, which does not require the prior on the mapping relationship between HR-HSI and LRI or MSI. Besides, the result is improved by iterating the proposed structure. Our model is composed of three blocks: degradation block, error map fusion block and reconstruction block. The degradation block is designed to simulate the spatial and spectral down-sampling process of hyperspectral images. Then, error maps in space and spectral domain are acquired by subtracting the degradation results from the inputs. The error map fusion block fuses those errors to obtain specific error maps corresponding to initialize HSI. In the case that the learned degradation process could represent the real mapping function, this block ensures to generate accurate errors between degraded images and the ground truth. The reconstruction block uses the fused maps to correct HSI, and finally produce high-precision hyperspectral images. Experiment results on CAVE and Harvard dataset indicate that the proposed method achieves good performance both visually and quantitatively compared with some SOTA methods.KeywordsHyperspectral imageImage fusionDeep learningDegradation model
- Research Article
9
- 10.1109/tgrs.2023.3254556
- Jan 1, 2023
- IEEE Transactions on Geoscience and Remote Sensing
Improving the spatial resolution of hyperspectral (HS) images is of great significance for the subsequent applications.1As the multispectral (MS) image can provide abundant complementary land-cover spatial information, hyperspectral and multispectral image fusion (HMF) have become a mainstream to generate HS images with both high spatial and spectral resolution. HMF has witnessed rapid progress by leveraging dictionary learning technique. However, existing approaches are highly sensitive to the image registration accuracy, and the reconstruction performance of the non-overlapped spectral bands between HS and MS image are extremely limited. To alleviate the effect of image misregistration and enrich the spectral information of non-overlapped bands, a general HMF dictionary learning framework which considers non-overlapped spectral bands reconstruction and image misregistration is proposed in this paper. For registration error, the proposed method is rectified by the improved dictionary learning, which can solve the problem of the spectral information matching gap existing in traditional HMF methods between HS image with MS image. Meanwhile, for non-overlapped spectral bands reconstruction, a novel coefficient optimization strategy is adopted to improve the non-overlapped bands reconstruction. Therefore, the registration error can be avoided to greatest extent and the accuracy of non-overlapped bands reconstruction can be effectively improved. Experiments both on simulated and real-world datasets demonstrate that the proposed method can effectively tackle the registration error problem and increase HMF accuracy with different spectral range. Meanwhile, the proposed framework provides guidance significance for the dictionary learning based HMF methods with various constrains to improve the non-overlapped bands reconstruction accuracy.