Abstract

Versatile Video Coding (VVC) standard is currently being prepared as the latest video coding standard of the ITU-T and ISO/IEC. The primary goal of the VVC, expected to be finalized in 2020, is to further improve compression performance compared to its predecessor HEVC. The frame level, slice level or Wavefront parallel processing (WPP) existing in VTM (VVC Test Model) doesn't fully utilize the CPU capabilities available in today's multicore systems. Moreover, VTM decoder sequentially processes the decoding tasks. This design is not parallelization friendly. This paper proposes re-designed decoding tasks that parallelize the decoder using: 1. Load balanced task parallelization and 2. CTU (Coding Tree Unit) based data parallelization. The design overcomes the limitations of the existing parallelization techniques by fully utilizing the available CPU computation resource without compromising on the coding efficiency and the memory bandwidth. The parallelization of CABAC and the slice decoding tasks is based on a load sharing scheme, while parallelization of each sub-module of the slice decoding task uses CTU level data parallelization. The parallelization scheme may either remain restricted within an individual decoding task or utilize between task parallelization. Such parallelization techniques achieve real-time VVC decoding on multi-core CPUs, for bitstreams generated using VTM5.0 using Random-Access configuration. An overall average decoding time reduction of 88.97% (w.r.t. VTM5.0 decoder) is achieved for 4K sequences on a 10-core processor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call