Abstract

The video super-resolution (VSR) is a traditional computer vision task, and in recent years, the methods based on deep learning have achieved promising performance. Most of the previous VSR methods utilize optical flow to perform alignment. However, in the case of occlusions and large motions, it is difficult to obtain accurate optical flow, which will affect the final recovery results. In addition, how to effectively fuse spatio-temporal information in different frames is also a key problem in VSR. However, most of the existing approaches ignore the value of local and/or global spatial-temporal information. In this paper, we propose a novel Spatio-Temporal Fusion Network (STFN) that contains a feature extraction module, an alignment module, a fusion module and a reconstruction module, to address these challenges. Specifically, we propose an enhanced cascaded feature fusion block (ECFFB) in alignment module, which can predict offsets of deformable convolution (DC) more precisely to effectively handle large motions. Furthermore, we design a fusion module, including local temporal fusion block (LTFB) and pyramidal spatio-temporal fusion block (PSTF) in a single-branch manner. These two blocks utilize local and global spatiotemporal information respectively to achieve better fusion result. Experimental results demonstrate that our STFN significantly outperforms the state-of-the-arts methods on several datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call