Non-local sparse attention based swin transformer V2 for image super-resolution

Ningning Lv,Fuxiang Lu,Min Yuan,Kun Zhan,Yufei Xie

doi:10.1016/j.sigpro.2024.109542

Abstract

In single image super resolution tasks, distortion measurement (such as PSNR, SSIM) and perceptual quality (such as PI, NIQE) are contradictory, and methods that perform well in perceptual quality often perform poorly in distortion measurement, and vice versa. In this article, we propose a method of balancing the two, which is divided into three stages. Firstly, this article proposes an image super-resolution model NLSAV2 that focuses on PSNR and SSIM metrics. The entire NLSAV2 consists of three modules: shallow feature extraction, deep feature extraction, and high-quality image reconstruction. In the shallow feature extraction module, non-local sparse attention is used to identify the most abundant feature information in mapping input from low-dimensional space to high-dimensional space, and the deep feature extraction module mainly consists of residual Swin Transformer V2 Block. Then, NLSAV2 is used as a generator and a relative discriminator is introduced to further train the model, which is called NLSAV2-GAN. The experimental results indicate that NLSAV2 and NLSAV2-GAN exhibit advantages in distortion measurement and perceptual quality respectively. Finally, network interpolation and image interpolation strategies are used to continuously adjust the reconstruction style and smoothness to achieve a balance between the two.

Full Text