Efficient Intra Bitrate Transcoding for Screen Content Coding Based on Convolutional Neural Network

Wei Kuang,Sik-Ho Tsang,Yui-Lam Chan

doi:10.1109/access.2019.2933029

Wei Kuang, Sik-Ho Tsang + Show 1 more

Open Access

https://doi.org/10.1109/access.2019.2933029

Copy DOI

Abstract

The Screen Content Coding (SCC) extension of High Efficiency Video Coding (HEVC) is developed to improve the coding efficiency of screen content videos. To meet the diverse network requirement of different clients, bitrate transcoding for SCC is desired. This problem can be solved by a conventional brute-force transcoder (CBFT) which concatenates an original decoder and an original encoder. However, it induces high computational complexity associated with the re-encoding part of CBFT. This paper presents a convolutional neural network based bitrate transcoder (CNN-BRT) for SCC. By utilizing information from both the decoder side and the encoder side, CNN-BRT makes a fast prediction for all coding units (CUs) of a coding tree unit (CTU) in a single test. At the decoder side, decoded optimal mode maps that reflect the optimal modes and CU partitions in a CTU is derived. At the encoder side, the raw samples in a CTU are collected. Then, they are fed to CNN-BRT to make a fast prediction. To imitate the optimal mode selection in the original re-encoding part, CNN-BRT involves a loss function that takes both of the sub-optimal modes and the final optimal modes into consideration. Compared with the HEVC-SCC reference software SCM-3.0, the proposed CNN-BRT reduces encoding time by 54.86% on average with a negligible Bjontegaard delta bitrate increase of 1.01% under all-intra configuration.

Highlights

Screen content videos have gained popularity with the fast development of mobile and cloud technologies, and they have many applications such as online education, video conference with document sharing, remote desktop, and wireless display [1]
Screen content videos are captured from the display screens of various electronic devices, and they usually show a mixed content of camera-captured natural image blocks (NIBs) and computer-generated screen content blocks (SCBs)
EXPERIMETNAL RESULTS For the simplicity of comparison, the proposed convolutional neural network based bitrate transcoder (CNN-BRT) has been implemented in the same reference software as in the only work of fast Screen Content Coding (SCC) bitrate transcoding [29], HM-16.2+SCM-3.0, and the proposed Caffe model can be found on our website [41]

Summary

INTRODUCTION

Screen content videos have gained popularity with the fast development of mobile and cloud technologies, and they have many applications such as online education, video conference with document sharing, remote desktop, and wireless display [1]. 79.16%, 70.85%, and 66.65% areas share the same optimal modes between the high-bitrate and low-bitrate streams for QP of 2, 4, and 6, respectively Based on this observation, the information from the decoder side is useful to speed up the re-encoding process. 4) AUXILIARY CLASSIFIERS AND FINAL CLASSIFIERS As reviewed in Section II.A, a SCC encoder first decides the sub-optimal mode of a CU by its local content and decides the final optimal mode of a CU by comparing it with CUs in other depth levels. Since the receptive field of each element in the feature maps of conv6–conv is a local CU, Auxiliary Classifier0–Auxiliary Classifier are designed to predict the sub-optimal modes for CUs in the depth levels of 0 to 3, respectively. In CNN-BRT, each convolutional or deconvolutional layer is followed by the rectified linear unit (ReLU) activation function, except for conv6–conv, where softmax is utilized to generate the output labels

TRAINING STRATEGY

EXPERIMETNAL RESULTS

CONCLUSION