Abstract

Video prediction is challenging, due to the pixel-level precision requirement and the difficulty in capturing scene dynamics. Most approaches tackle the problems by pixel-level reconstruction objectives and two decomposed branches, which still suffer from blurry generations or dramatic degradations in long-term prediction. In this paper, we propose a Motion-Aware Feature Enhancement (MAFE) network for video prediction to produce realistic future frames and achieve relatively long-term predictions. First, a Channel-wise and Spatial Attention (CSA) module is designed to extract motion-aware features, which enhances the contribution of important motion details during encoding, and subsequently improves the discriminability of attention map for the frame refinement. Second, a Motion Perceptual Loss (MPL) is proposed to guide the learning of temporal cues, which benefits to robust long-term video prediction. Extensive experiments on three human activity video datasets: KTH, Human3.6M, and PennAction demonstrate the effectiveness of the proposed video prediction model compared with the state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.