Deep Video Prediction Network-Based Inter-Frame Coding in HEVC

Jung-Kyung Lee,Seunghyun Cho,Je-Won Kang,Nayoung Kim

doi:10.1109/access.2020.2993566

Abstract

In this paper, we propose a novel Convolutional Neural Network (CNN) based video coding technique using a video prediction network (VPN) to support enhanced motion prediction in High Efficiency Video Coding (HEVC). Specifically, we design a CNN VPN to generate a virtual reference frame (VRF), which is synthesized using previously coded frames, to improve coding efficiency. The proposed VPN uses two sub-VPN architectures in cascade to predict the current frame in the same time instance. The VRF is expected to have higher temporal correlation than a conventional reference frame, and, thus it is substituted for a conventional reference frame. The proposed technique is incorporated into the HEVC inter-coding framework. Particularly, the VRF is managed in a HEVC reference picture list, so that each prediction unit (PU) can choose a better prediction signal through Rate-Distortion optimization without any additional side information. Furthermore, we modify the HEVC inter-prediction mechanisms of Advanced Motion Vector Prediction and Merge modes adaptively when the current PU uses the VRF as a reference frame. In this manner, the proposed technique can exploit the PU-wise multi-hypothesis prediction techniques in HEVC. Since the proposed VPN can perform both the video interpolation and extrapolation, it can be used for Random Access (RA) and Low Delay B (LD) coding configurations. It is shown in experimental results that the proposed technique provides -2.9% and -5.7% coding gains, respectively, in RA and LD coding configurations as compared to the HEVC reference software, HM 16.6 version.

Highlights

Convolutional Neural Network (CNN) becomes a subject of considerable attention in video coding
While the CNN video prediction network (VPN) architecture and the separable convolution scheme has been originally developed for bidirectional prediction in [26], it could be extended by retraining network parameters to support either unidirectional or bidirectional prediction [28], [30] for more general purpose
The virtual reference frame (VRF) is substituted for a conventional reference frame to improve coding efficiency in a codec

Summary

INTRODUCTION

Convolutional Neural Network (CNN) becomes a subject of considerable attention in video coding. Enhanced temporal information brings significant coding performance by CNNs. In our previous works [7], [25], a deep video prediction network (VPN) [26] being originally developed for video frame rate-up conversion has been applied to video coding by synthesizing a reference frame. In this paper, inspired by our previous works and the recent advances in VPNs, we focus on improving the quality of generated video frames and propose a novel CNN-based inter-prediction technique for a video coding. While the CNN VPN architecture and the separable convolution scheme has been originally developed for bidirectional prediction in [26], it could be extended by retraining network parameters to support either unidirectional or bidirectional prediction [28], [30] for more general purpose. Because we train the network using the original past and future frames, there is the only one network to be used for different quantization parameters (QPs)

FRAME PREDICTION PERFORMANCE EVALUATION

RANDOM ACCESS AND LOW DELAY CODING

REORGANIZATION OF REFERENCE PICTURE LISTS

CODING PERFORMANCE AND COMPLEXITY EVALUATION AND ANALYSIS

PERFORMANCE ANALYSIS IN VARIOUS CONFIGURATIONS

CONCLUSION