Video Super-Resolution Based on Spatial-Temporal Transformer

Minyan Zheng,Wenming Cao,Jianping Luo

doi:10.1109/ccis53392.2021.9754604

Abstract

In this paper, we proposed a Spatial-Temporal Transformer (STTF) algorithm for video super resolution (SR), to solve the problem of blurs or artifacts after super resolve low-resolution (LR) video with traditional super resolution algorithm. Firstly, the algorithm uses residual blocks to extract initial features from video sequences. Secondly, the three-dimensional video features are decomposed into image patches and then are sent to the Spatial-Temporal Transformer network for self-attention among patches where patches can be aligned and fused. Finally, sub-pixel convolution layer and residual layers are applied to up-sampling and reconstruct the high-resolution (HR) video sequences. In order to improve video visual effects, minimum mean square error (MSE) loss function is applied to train the neural network. The experimental results show that the STTF network has a higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) compared to traditional super-resolution algorithm.

Full Text