Abstract

We propose a generative framework that tackles video frame interpolation. Conventionally, optical flow methods can solve the problem, but the perceptual quality depends on the accuracy of flow estimation. Nevertheless, a merit of traditional methods is that they have a remarkable generalization ability. Recently, deep convolutional neural networks (CNNs) have achieved good performance at the price of computation. However, to deploy a CNN, it is necessary to train it with a large-scale dataset beforehand, not to mention the process of fine tuning and adaptation afterwards. Also, despite the sharp motion results, their perceptual quality does not correlate well with their pixel-to-pixel difference metric performance due to various artifacts created by erroneous warping. In this paper, we take the advantages of both conventional and deep-learning models, and tackle the problem from a different perspective. The framework, which we call deep locally temporal embedding (DeepLTE), is powered by a deep CNN and can be used instantly like conventional models. DeepLTE fits an auto-encoding CNN to several consecutive frames and embeds some constraints on the latent representations so that new frames can be generated by interpolating new latent codes. Unlike the current deep learning paradigm which requires training on large datasets, DeepLTE works in a plug-and-play and unsupervised manner, and is able to generate an arbitrary number of frames from multiple given consecutive frames. We demonstrate that, without bells and whistles, DeepLTE outperforms existing state-of-the-art models in terms of the perceptual quality.

Highlights

  • With the advances in video-capturing devices, there has been an increasing demand for high-quality videos

  • EXPERIMENTAL RESULTS We considered three variants of deep locally temporal embedding (DeepLTE): DeepLTE-1, DeepLTE-2, and DeepLTE-3, which correspond to the first, second, and third-order latent interpolation in DeepLTE, respectively

  • We optimized the weights of the convolutional neural networks (CNNs) based on these reconstructions and interpolations

Read more

Summary

Introduction

With the advances in video-capturing devices, there has been an increasing demand for high-quality videos. Video Frame Interpolation: is among the most longstanding and challenging problems in video processing. This problem is inherently ill-posed because given two consecutive frames, a number of solutions can be valid. Another approach called Eulerian eliminates the need for flow computation as it characterizes motion over time at each fixed location.

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.