Spatial-Temporal Fusion Convolutional Neural Network for Compressed Video Enhancement in HEVC

Li Yu,Shengju Yu,Xiaoyu Xu,Jian Qian,Hao Tao,Hongkui Wang

doi:10.1109/dcc47342.2020.00066

Abstract

Convolutional neural network has witnessed remarkable progress in compressed video quality enhancement in high efficiency video coding (HEVC) standard. But most existing methods focus on single frame quality enhancement where copious temporal and spatial information is neglected. In this paper, we propose a spatial-temporal fusion convolutional neural network (STEF-CNN) to employ spatial and temporal information to improve the performance of in-loop filter in HEVC. Specifically, the STEF-CNN adopts a pre-denoising network which in advance processes the compressed videos frame by frame. The pre-denoising operation alleviates the impact of noise and blocking artifacts. Then the denoised frames are sent to temporal-spatial fusion module which picks out valuable temporal and spatial information. The fused frames are eventually fed to quality enhancement network which is based on residual learning and dense network. The STEF-CNN is capable of capturing abundant information from consecutive neighboring frames. Extensive experimental results demonstrate the effectiveness of the proposed method. The STEF-CNN achieves 11.53% BD-BR reduction in all-intra (AI) configuration and 10.20% BD-BR reduction in random-access (RA) configuration.

Full Text