Abstract

Remote sensing image fusion aims at fusing high spatial resolution single-band panchromatic (PAN) image with spectrally informative multispectral (MS) image to generate panchromatic sharpened image with high resolution and color information, it is also called pansharpening. Most of the proposed single convolutional neural network (CNN) or transformer-based pansharpening methods own several problems, such as inability to acquire long-range features or difficult to train, resulting the loss of spatial details and colors. In addition, the computational complexity of transformer cannot be ignored. In this work, we propose a dual-branch hybrid CNN-Transformer network (DBCT-Net) that utilizes the local specificity of CNN and models the global dependencies by transformer. First, a multi-branch dense connected block (MDCB-4) network is designed to obtain spectral and textural information in MS and PAN images, respectively. Next, an encoder–decoder transformer based on the self-attention and co-attention modules is able to inject the missing local and global information, which can further enhance the results. It is worth noting that an inverted multi-head transposed attention (IMTA) is applied here to build attention maps from feature dimensions, which greatly reduces the computation time. Finally, an image reconstruction module is employed to effectively fuse the acquired texture and spectral features. Furthermore, to generate visually better pansharpened images, we propose a combined loss function that includes a focal frequency loss. Extensive experiments on WorldView II (WV2), GF-2,and QuickBird (QB) datasets show that DBCT-Net can perform better in spatial preservation and spectral feature recovery.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call