Abstract

To obtain a high-resolution hyperspectral image (HR-HSI), fusing a low-resolution hyperspectral image (LR-HSI) and a high-resolution multispectral image (HR-MSI) is a prominent approach. Numerous approaches based on convolutional neural networks (CNNs) have been presented for hyperspectral image (HSI) and multispectral image (MSI) fusion. Nevertheless, these CNN-based methods may ignore the global relevant features from the input image due to the geometric limitations of convolutional kernels. To obtain more accurate fusion results, we provide a spatial-spectral transformer-based U-net (SSTF-Unet). Our SSTF-Unet can capture the association between distant features and explore the intrinsic information of images. More specifically, we use the spatial transformer block (SATB) and spectral transformer block (SETB) to calculate the spatial and spectral self-attention, respectively. Then, SATB and SETB are connected in parallel to form the spatial-spectral fusion block (SSFB). Inspired by the U-net architecture, we build up our SSTF-Unet through stacking several SSFBs for multiscale spatial-spectral feature fusion. Experimental results on public HSI datasets demonstrate that the designed SSTF-Unet achieves better performance than other existing HSI and MSI fusion approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call