Abstract

Multiimage super-resolution (MISR), as one of the most promising directions in remote sensing, has become a needy technique in the satellite market. A sequence of images collected by satellites often has plenty of views and a long time span, so integrating multiple low-resolution views into a high-resolution image with details emerges as a challenging problem. However, most MISR methods based on deep learning cannot make full use of multiple images. Their fusion modules are incapable of adapting to an image sequence with weak temporal correlations well. To cope with these problems, we propose a novel end-to-end framework called TR-MISR. It consists of three parts: An encoder based on residual blocks, a transformer-based fusion module, and a decoder based on subpixel convolution. Specifically, by rearranging multiple feature maps into vectors, the fusion module can assign dynamic attention to the same area of different satellite images simultaneously. In addition, TR-MISR adopts an additional learnable embedding vector that fuses these vectors to restore the details to the greatest extent. TR-MISR has successfully applied the transformer to MISR tasks for the first time, notably reducing the difficulty of training the transformer by ignoring the spatial relations of image patches. Extensive experiments performed on the PROBA-V Kelvin dataset demonstrate the superiority of the proposed model that provides an effective method for transformers in other low-level vision tasks.

Highlights

  • I MAGE super-resolution, as one of the critical technologies in computer vision, aims to convert low-resolution images into high-resolution images

  • To address the problems of multiimage super-resolution (MISR), we propose a novel endto-end network based on transformers, namely, TR-MISR

  • It is worth mentioning that when we set (k, N, p, M ) to (32, 2, 8, 6) and use the entire training set to train for 400 epochs on each band, TR-MISR reaches an Rscore of 0.93001 on the testing set of the PROBA-V challenge and places at the top of the leaderboard

Read more

Summary

INTRODUCTION

I MAGE super-resolution, as one of the critical technologies in computer vision, aims to convert low-resolution images into high-resolution images. MISR requires focusing on specific areas of images regardless of their position, which means that the existing attention modules [27], [30] need to be improved. TR-MISR does not require pretraining, because the fusion module reduces the training difficulty of the transformer by preventing it from learning the spatial relations between different image patches. This advantage alleviates the problem of insufficient MISR data in remote sensing. With our proposed feature rearrangement module, TR-MISR can simultaneously focus on all image patches and adapt to multiple images with weak temporal correlations In this way, TR-MISR can accommodate image sequences of any length, which notably improves the utilization of multiple images.

RELATED WORK
Transformer
Single Image Super-Resolution
Video Super-Resolution
Multiimage Super-Resolution
Structure of Transformer
TR-MISR Framework
Loss Function
EXPERIMENTS
PROBA-V Kelvin Dataset
Evaluation Metric
Experimental Settings
Comparing Methods
Analysis of Fusion Modules
Attention Visualization
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call