High-Frequency Transformer Network Based on Window Cross-Attention for Pansharpening

Chengjie Ke,Xin Tian,Duidui Li,Hao Liang

doi:10.1109/icassp49357.2023.10096538

Abstract

Inspired by the powerful ability to capture long-distance dependencies in the vision transformer, we propose a novel high-frequency transformer network based on window cross-attention to fuse panchromatic (PAN) and multispectral (MS) images for a high-resolution MS image. To overcome the problem brought by shallow feature extraction in the previous transformer-based fusion network, we combine high-pass filtering and deep feature extraction to explore more texture information. As a result, the obtained relationship between MS and PAN images according to feature similarity is more accurate. In particular, we build the cross-modality correlation by a window cross-attention mechanism at pixel-level between MS and PAN images’ local window. Compared with patch-level, pixel-level helps to preserve fine-grained features. Therefore, more spatial details from a PAN image are transferred to an MS image, leading to a clearer fused MS image with good preservation of spectral information. Experimental results demonstrate that the proposed method outperforms the comparison methods in terms of visual and quantitative qualities.

Full Text