Versatile Video Coding (VVC), the state-of-the-art video coding standard, was developed by the Joint Video Experts Team (JVET) of ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) in 2020. Although VVC can provide powerful coding performance, it requires tremendous computational complexity to determine the optimal mode decision during the encoding process. In particular, VVC adopted the bi-prediction with CU-level weight (BCW) as one of the new tools, which enhanced the coding efficiency of conventional bi-prediction by assigning different weights to the two prediction blocks in the process of inter prediction. In this study, we investigate the statistical characteristics of input features that exhibit a correlation with the BCW and define four useful types of categories to facilitate the inter prediction of VVC. With the investigated input features, a lightweight neural network with multilayer perceptron (MLP) architecture is designed to provide high accuracy and low complexity. We propose a fast BCW mode decision method with a lightweight MLP to reduce the computational complexity of the weighted multiple bi-prediction in the VVC encoder. The experimental results show that the proposed method significantly reduced the BCW encoding complexity by up to 33% with unnoticeable coding loss, compared to the VVC test model (VTM) under the random-access (RA) configuration.
Read full abstract