Abstract

Although deep learning-based approaches for video processing have been extensively investigated, the lack of generality in network construction makes it challenging for practical applications, particularly in video restoration. As a result, this paper presents a universal video restoration model that can simultaneously tackle video inpainting and super-resolution tasks. The network, called Video-Restoration-Net (VRN), consists of four components: (1) an encoder to extract features from each frame, (2) a non-local network that recombines features from adjacent frames or different locations of a given frame, (3) a decoder to restore the coarse video from the output of a non-local block, and (4) a refinement network to refine the coarse video on the frame level. The framework is trained in a three-step pipeline to improve training stability for both tasks. Specifically, we first suggest an automated technique to generate full video datasets for super-resolution reconstruction and another complete-incomplete video dataset for inpainting, respectively. A VRN is then trained to inpaint the incomplete videos. Meanwhile, the full video datasets are adopted to train another VRN frame-wisely and validate it against authoritative datasets. We show quantitative comparisons with several baseline models, achieving 40.5042 dB/0.99473 on PSNR/SSIM in the inpainting task, while during the SR task we obtained 28.41 dB/0.7953 and 27.25/0.8152 on BSD100 and Urban100, respectively. The qualitative comparisons demonstrate that our proposed model is able to complete masked regions and implement super-resolution reconstruction in videos of high quality. Furthermore, the above results show that our method has greater versatility both in video inpainting and super-resolution tasks compared to recent models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call