Abstract

Video super-resolution (VSR) aims at generating high-resolution (HR) video frames with plausible and temporally consistent details using their low-resolution (LR) counterparts, and neighboring frames. The key challenge for VSR lies in the effective exploitation of intra-frame spatial relation and temporal dependency between consecutive frames. Many existing techniques utilize spatial and temporal information separately and compensate motion via alignment. These methods cannot fully exploit the spatio-temporal information that significantly affects the quality of resultant HR videos. In this work, a novel deformable spatio-temporal convolutional residual network (DSTnet) is proposed to overcome the issues of separate motion estimation and compensation methods for VSR. The proposed framework consists of 3D convolutional residual blocks decomposed into spatial and temporal (2+1) D streams. This decomposition can simultaneously utilize input video’s spatial and temporal features without a separate motion estimation and compensation module. Furthermore, the deformable convolution layers have been used in the proposed model that enhances its motion-awareness capability. Our contribution is twofold; firstly, the proposed approach can overcome the challenges in modeling complex motions by efficiently using spatio-temporal information. Secondly, the proposed model has fewer parameters to learn than state-of-the-art methods, making it a computationally lean and efficient framework for VSR. Experiments are conducted on a benchmark Vid4 dataset to evaluate the efficacy of the proposed approach. The results demonstrate that the proposed approach achieves superior quantitative and qualitative performance compared to the state-of-the-art methods.

Highlights

  • In recent years, image and video super-resolution have attracted a lot of attention due to their wide range of applications, including, but not limited to, medical image reconstruction, remote sensing, panorama video super-resolution, UAV surveillance, and high-definition television (HDTV) [1,2,3]

  • This paper proposes an end-to-end deformable spatio-temporal convolutional residual network (DSTnet) for Video super-resolution (VSR) by adopting the ResNet as the underlying architecture

  • The proposed method is compared with several single image super-resolution methods and VSR methods, including VESPCN [27], RCAN [22], VSRnet [24], SOF-VSR [25], BRCN [29], DBPN [31], VSRResNet [32], TOFlow [6], and temporally deformable alignment network (TDAN) [8] on the benchmark VSR Vid4 [47]

Read more

Summary

Introduction

Image and video super-resolution have attracted a lot of attention due to their wide range of applications, including, but not limited to, medical image reconstruction, remote sensing, panorama video super-resolution, UAV surveillance, and high-definition television (HDTV) [1,2,3]. Standard optical flow calculated by these algorithms is not an optimal motion representation for video restoration tasks, including VSR [6] To address this issue, several methods proposed the use of local spatio-temporal information between frames to capture optical flow for motion estimation [7,8]. Several methods proposed the use of local spatio-temporal information between frames to capture optical flow for motion estimation [7,8] These methods employed spatial and temporal information separately to extract the feature from frames and estimate movement during the motion compensation step. These methods lacked in utilizing the discriminating spatio-temporal information of input LR frames efficiently, resulting in reduced coherence between reconstructed HR videos [9].

Related Work
Deformable Convolution-Based Methods
Spatio-Temporal Convolutional Residual Blocks
Deformable Spatio-Temporal Convolutional Residual Block
Temporal Fusion
SR Reconstruction
Experiment Details
Results
Method
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call