A Study on the Impact of Training Data in CNN-Based Super-Resolution for Low Bitrate End-to-End Video Coding

Fatemeh Nasiri,Wassim Hamidouche,Luce Morin,Nicolas Dhollande,Gildas Cocherel

doi:10.1109/ipta50016.2020.9286717

Abstract

In this study, the effectiveness of Super Resolution (SR) methods based on Convolutional Neural Network (CNN) in low bitrate video coding, with a focus on the Versatile Video Coding Standard (VVC), is investigated. Video transmission over networks with limited bandwidth is a common challenge for different applications. One solution is to adopt SR methods where the main principle is to spatially down-sample the input sequence prior to the encoding, then up-sampling the decoded sequence before displaying it. For a fixed target bandwidth, a finer quantization is applied on the low-resolution sequence compared to high-resolution, so that the high quality reconstructed pixels help in retrieving the lost information. However, most CNN-based SR methods are designed for single images and merely focus on the original input signal. Therefore, their trained networks lack understanding of compression artifacts. In this study, we test a hypothesis that training CNN-based SR methods with compressed sequences outperforms training with uncompressed ones. The assumption is that such training allows the SR methods to learn compression artifacts and differentiate them from actual texture information. To this end, state-of-the-art CNN-based SR methods are tested with compressed and uncompressed training set. Experiments show that the use of compressed training data brings, on average, an additional bitrate saving of 6%, in terms of BD-Rate.

Full Text