Abstract

Frame extrapolation is to predict future frames from the past (reference) frames, which has been studied intensively in the computer vision research and has great potential in video coding. Recently, a number of studies have been devoted to the use of deep networks for frame extrapolation, which achieves certain success. However, due to the complex and diverse motion patterns in natural video, it is still difficult to extrapolate frames with high fidelity directly from reference frames. To address this problem, we introduce reference frame alignment as a key technique for deep network-based frame extrapolation. We propose to align the reference frames, e.g. using block-based motion estimation and motion compensation, and then to extrapolate from the aligned frames by a trained deep network. Since the alignment, a preprocessing step, effectively reduces the diversity of network input, we observe that the network is easier to train and the extrapolated frames are of higher quality. We verify the proposed technique in video coding, using the extrapolated frame for inter prediction in High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). We investigate different schemes, including whether to align between the target frame and the reference frames, and whether to perform motion estimation on the extrapolated frame. We conduct a comprehensive set of experiments to study the efficiency of the proposed method and to compare different schemes. Experimental results show that our proposal achieves on average 5.3% and 2.8% BD-rate reduction in Y component compared to HEVC, under low-delay P and low-delay B configurations, respectively. Our proposal performs much better than the frame extrapolation without reference frame alignment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call