Abstract

In this work, we explore video frame inpainting, a task that lies at the intersection of general video inpainting, frame interpolation, and video prediction. Although our problem can be addressed by applying methods from other video interpolation or extrapolation tasks, doing so fails to leverage the additional context information that our problem provides. To this end, we devise a method specifically designed for video frame inpainting that is composed of two modules: a bidirectional video prediction module and a temporally-aware frame interpolation module. The prediction module makes two intermediate predictions of the missing frames, each conditioned on the preceding and following frames respectively, using a shared convolutional LSTM-based encoder-decoder. The interpolation module blends the intermediate predictions by using time information and hidden activations from the video prediction module to resolve disagreements between the predictions. Our experiments demonstrate that our approach produces smoother and more accurate results than state-of-the-art methods for general video inpainting, frame interpolation, and video prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call