Abstract

Deep spatio-temporal neural networks have shown promising performance for video super-resolution (VSR) in recent years. However, most of them heavily rely on accuracy motion estimations. In this paper, we propose a novel spatio-temporal matching network (STMN) for video super-resolution, which works on the wavelet domain to reduce dependence on motion estimations. Specifically, our STMN consists of three major components: a temporal fusion wavelet network (TFWN), a non-local matching network (NLMN), and a global wavelet domain residual connection (GWDRC). TFWN adaptively extracts temporal fusion wavelet maps via three 3d convolutional layers and a discrete wavelet transform (DWT) decomposition layer. The extracted temporal fusion wavelet maps are rich in spatial information and knowledge of different frequencies from consecutive frames, which are feed to NLMN for learning deep wavelet representations. NLMN integrates super-resolution and denoising into a unified module by pyramidally stacking non-local matching residual blocks (NLMRB). At last, GWDRC reconstructs the super-resolved frames from the deep wavelet representations by using global wavelet domain residual information. Consequently, our STMN can efficiently enhance reconstruction quality by capturing different frequencies wavelet representations in consecutive frames, and does not require any motion compensation. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of our method compared with state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call