Abstract

Video prediction has developed rapidly after the booming of deep learning. As an important part of unsupervised representation learning, it plays an important role in anomalous behavior detection, autonomous driving, video games, and other fields. However, the prediction method based on optical flow estimation is susceptible to brightness change and camera shake, and it is difficult to predict occluded objects. While the prediction method based on pixel generation is difficult to fit ambiguous and complex scenes, which leads to a blurry prediction. In this work, we proposed an end-to-end video prediction framework that combines the optical flow estimation module with the pixel generation module by a learnable mask weight to predict high-fidelity videos. To further improve the prediction effect, we introduce adversarial training to the framework. We introduced a frame discriminator and a sequence discriminator to ensure the consistency of the spatio-temporal distribution of predicted video frames and real video frames. The results of experiments on challenging datasets demonstrate the practicability and effectiveness of our proposed video prediction framework. On the one hand, our proposed framework has achieved an equal quality compared with the current latest model, which requires fewer parameters and has a faster prediction speed. On the other hand, the results of ablation experiments demonstrate the effect of fusing different modules and the effectiveness of adversarial training.

Highlights

  • With the wide use of different sensors, devices, and the Internet in society, the era of Internet of Things is on the horizon

  • To compare the performance of the proposed model with related works, three types of representative models of different video prediction methods are used for comparison: (1) Models based on pixel generation: BeyondMSE[11], PredNet[8], CycleGAN[13], ContextVP[9]

  • The fusion of the optical flow estimation module and the pixel generation module can greatly improve the prediction effect, which verifies that the weighted fusion of two different modules with a learnable mask can better complement each other

Read more

Summary

A Video Prediction Method Based on Optical Flow Estimation and Pixel Generation

This work was supported in part by the Science Foundation of The China(Xi'an) Institute for Silk Road Research (2019YA07, and 2019YB05),National Statistical Science Research Project(2016LY59) , and in part by the Research Foundation of Xi'an University of Finance and Economics under Grant 18FCJH02.

INTRODUCTION
RELATED WORK
PIXEL GENERATION
FUSION OF OPTIMAL FLOW and PIXEL GENERATION
ADVERSARIAL TRAINING
EXPERIMENTS AND ANALYSIS
EVALUATION METRICS
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.