Abstract

Given a video sequence, video frame interpolation aims to synthesize an in-between frame of two consecutive frames. In this paper, we propose a multi-scale position feature transform (MS-PFT) network for video frame interpolation where two parallel prediction networks and one optimization network are designed to predict the features of target frame and generate the final interpolation result, respectively. To increase the fidelity of the synthesised frames, we propose to apply a position feature transform (PFT) layer in the residual blocks of the prediction networks to estimate scaling factors which help evaluate different degrees of the importance of deep features around a target pixel. A PFT layer utilizes optical flow to extract and generate position features and then adjusts the learning process of our model. We further extend our model into a multi-scale structure in which each scale of the network shares the same parameters to maximise the efficiency of our network with model size unchanged. The experiments show that our method can handle the challenging scenarios like occlusion and large motion effectively and the proposed method outperforms those state-of-the-art approaches on different datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.