Abstract

Swin Transformers have been designed and used in various image super-resolution (SR) applications. One of the recent image restoration methods is RSTCANet, which combines Swin Transformer with Channel Attention. However, for some channels of images that may carry less useful information or noise, Channel Attention cannot automatically learn the insignificance of these channels. Instead, it tries to enhance their expression capability by adjusting the weights. It may lead to excessive focus on noise information while neglecting more essential features. In this paper, we propose a new image SR method, RSVTCANet, based on an extension of Swin2SR. Specifically, to effectively gather global information for the channel of images, we modify the Residual SwinV2 Transformer blocks in Swin2SR by introducing the coordinate attention for each two successive SwinV2 Transformer Layers (S2TL) and replacing Multi-head Self-Attention (MSA) with Efficient Multi-head Self-Attention version 2 (EMSAv2) to employ the resulting residual SwinV2 Transformer coordinate attention blocks (RSVTCABs) for feature extraction. Additionally, to improve the generalization of RSVTCANet during training, we apply an optimized RandAugment for data augmentation on the training dataset. Extensive experimental results show that RSVTCANet outperforms the recent image SR method regarding visual quality and measures such as PSNR and SSIM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call