Abstract

This paper proposes an optimal rate control model based on deep neural network (DNN) features to improve the coding tree unit (CTU)-level rate control in high-efficiency video coding for conversational videos. The proposed algorithm extracts high-level features from the original and previously reconstructed CTU blocks based on a predefined DNN model of the visual geometry group (VGG-16) network. Then, the correlation of the high-level feature and quantization parameter (QP) values of previously coded CTUs is explored for subjective visual characteristics to estimate the CTU-level rate control model parameters (alpha and beta) and the bit allocation of each CTU. Therefore, this paper also proposes a new model for Lambda estimation for each CTU by improving its relationship with the estimated bits per pixel to control the rate and relative distortion. Furthermore, the Lambda and QP boundary settings were adjusted based on the proposed perceptual model to ensure the rate control accuracy of each CTU. The results of experiments with the proposed algorithm, when compared to the rate control model in HM-16.20, reveal higher bitrate accuracy and an average BD-rate gain based on PSNR, SSIM, and MSSSIM metrics using the low-delay-P configuration.

Highlights

  • Rate control in all video coding applications is important for optimizing visual quality by appropriately allocating bits to each rate control stage at the group-of-picture-level, picturelevel, and block-level for a given bitrate condition

  • This paper presents a coding tree unit (CTU)-level rate control algorithm for an high efficiency video coding (HEVC) encoder based on a deep neural network (DNN) feature

  • This paper presents the development of a new estimation model for the α and β parameters as well as estimation models for the bit allocation, parameter λ, quantization parameter (QP) decision, and boundary adjustment of both λ and QP for the CTU-level rate control in the HEVC encoder

Read more

Summary

INTRODUCTION

Rate control in all video coding applications is important for optimizing visual quality by appropriately allocating bits to each rate control stage at the group-of-picture-level, picturelevel, and block-level for a given bitrate condition. The proposed algorithm explores the use of visual feature extractions based on a particular convolutional layer of a DNN model for CTU-level rate control purposes. CTU-level that is intended to improve the rate control performance for conversational video services Both spatial and temporal features are considered for estimating the α and β parameters that influence the design of other estimation processes, including the estimations of bit allocation, λ, and QP. To improve the correlations of αestimated − λ and βestimated − λ in the CTU-level rate control, the estimation process of the α and β parameters is analyzed by taking advantage of the VGG feature and QP value relationship in Figure 4 before completing the encoding process. In (7), ηCTU is calculated to represent the weight of the current CTU to satisfy the TCTU constraint

ESTIMATION OF λ AND QP FOR THE PROPOSED CTU-LEVEL RATE CONTROL
EXPERIMENTAL RESULTS
OBJECTIVE PERFORMANCE EVALUATIONS
COMPLEXITY PERFORMANCE EVALUATIONS
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call