Abstract
Predicting multiple future frames from a given video is a challenging problem due to several factors such as changing camera, dynamically moving objects, occlusions, etc. While recent deep learning methods have made significant progress on the video prediction problem, most methods predict the immediate or a fixed number of future frames. To obtain longer-term frame predictions, existing techniques usually process the predicted frames iteratively, resulting in blurry or inconsistent predictions. In this work, we present a new approach that can predict an arbitrary number of future video frames with a single forward pass through the network. Instead of directly predicting a fixed number of future optical flows or frames, we learn temporal motion encodings, i.e., temporal motion basis vectors and a network to predict the coefficients. The learned motion basis can be easily extended to arbitrary length at inference time, enabling us to predict an arbitrary number of future frames. Experiments on benchmark datasets show that our approach performs favorably against several competitive techniques even for the next frame prediction setting. When evaluated under 5-frame or 10-frame prediction settings, the proposed method achieves higher performance gains over the state-of-the-art techniques that iteratively process the predictions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.