Remote sensing image super-resolution via cross-scale hierarchical transformer

Yi Xiao,Qiangqiang Yuan,Jiang He,Liangpei Zhang

doi:10.1080/10095020.2023.2288179

Abstract

ABSTRACT Global and local modeling is essential for image super-resolution tasks. However, current efforts often lack explicit consideration of the cross-scale knowledge in large-scale earth observation scenarios, resulting in suboptimal single-scale representations in global and local modeling. The key motivation of this work is inspired by two observations: 1) There exists hierarchical features at the local and global regions in remote sensing images, and 2) they exhibit scale variation of similar ground objects (e.g. cross-scale similarity). In light of these, this paper presents an effective method to grasp the global and local image hierarchies by systematically exploring the cross-scale correlation. Specifically, we developed a Cross-scale Self-Attention (CSA) to model the global features, which introduces an auxiliary token space to calculate cross-scale self-attention matrices, thus exploring global dependency from diverse token scales. To extract the cross-scale localities, a Cross-scale Channel Attention (CCA) is devised, where multi-scale features are explored and progressively incorporated into an enriched feature. Moreover, by hierarchically deploying CSA and CCA into transformer groups, the proposed Cross-scale Hierarchical Transformer (CHT) can effectively explore cross-scale representations in remote sensing images, leading to a favorable reconstruction performance. Comprehensive experiments and analysis on four remote sensing datasets have demonstrated the superiority of CHT in both simulated and real-world remote sensing scenes. In particular, our CHT outperforms the state-of-the-art approach (TransENet) in terms of PSNR by 0.11 dB on average, but only accounts for 54.8% of its parameters.

Full Text