Abstract
We present a simple but effective video interpolation framework that can be applied to various types of videos including conventional videos and 360° videos. Our main idea is to predict the latent feature of an intermediate frame, through the latent feature encoders between encoder and decoder networks, without explicitly computing optical flow or depth maps. The latent feature encoders take latent features of input images and then predict the latent feature of a target image, i . e . an intermediate frame. Afterward, the decoder network reconstructs the target image from the latent feature. The proposed framework consists of fully convolutional networks, and it is therefore end-to-end trainable from scratch without requiring additional information except for consecutive frames. We experimentally verify the superiority of proposed method by comparing it to state-of-the-art methods with various types of datasets. Moreover, an ablation study is carried out to analyze the key components of the proposed method. Our proposed method performs interpolation in latent domain, it is advantageous to apply various video interpolation ( e . g . NIR and depth videos) without limiting the type of input data.
Highlights
Recent advances in deep learning have significantly improved the performance of video interpolation, and the state-ofthe-art methods show promising results on the benchmark datasets [1], [2]
We split the previous studies into two groups where the first group explicitly utilizes optical flow information as guiding information, but the second approach directly predicts an intermediate frame without additional information
In this work, we propose a fully convolutional video interpolation framework that can be trained for arbitrary videos
Summary
Recent advances in deep learning have significantly improved the performance of video interpolation, and the state-ofthe-art methods show promising results on the benchmark datasets [1], [2]. This technology can be applied to various applications including frame rate up-conversion [3], video compression [4], view synthesis [5], [6], and motion deblurring [7]. We split the previous studies into two groups where the first group explicitly utilizes optical flow information as guiding information, but the second approach directly predicts an intermediate frame without additional information. For the sake of clarity, we call the first approach as a guided approach and the second approach as a direct approach
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.