Transformer-based methods have demonstrated impressive performance in image super-resolution tasks. However, when applied to large-scale Earth observation images, the existing transformers encounter two significant challenges: (1) insufficient consideration of spatial correlation between adjacent ground objects; and (2) performance bottlenecks due to the underutilization of the upsample module. To address these issues, we propose a novel distance-enhanced strip attention transformer (DESAT). The DESAT integrates distance priors, easily obtainable from remote sensing images, into the strip window self-attention mechanism to capture spatial correlations more effectively. To further enhance the transfer of deep features into high-resolution outputs, we designed an attention-enhanced upsample block, which combines the pixel shuffle layer with an attention-based upsample branch implemented through the overlapping window self-attention mechanism. Additionally, to better simulate real-world scenarios, we constructed a new cross-sensor super-resolution dataset using Gaofen-6 satellite imagery. Extensive experiments on both simulated and real-world remote sensing datasets demonstrate that the DESAT outperforms state-of-the-art models by up to 1.17 dB along with superior qualitative results. Furthermore, the DESAT achieves more competitive performance in real-world tasks, effectively balancing spatial detail reconstruction and spectral transform, making it highly suitable for practical remote sensing super-resolution applications.
Read full abstract