GPGPU implementation of VP9 in-loop deblocking filter and improvements for AV1 codec

Zhijun Lei,Zhipin Deng,Srinath Reddy,Victor Cherepanov

doi:10.1109/icip.2017.8296416

Abstract

This paper describes the algorithm and processing flow of the in-loop deblocking filter in the VP9 coding standard, one of the most computationally intensive toolsets of the VP9 codec. Due to its inherent data dependency, it is a great challenge to efficiently implement the algorithm on massively parallel computing architectures, such as a General Purpose Graphical Processing Unit (GPGPU). In this paper, we describe the challenges involved in a GPGPU implementation of the VP9 Deblocking filter and introduce an innovative thread dispatching approach to address the parallelization challenges. This approach has been successfully implemented and productized in the VP9 decoder and encoder solutions on Intel GPUs. In order to further improve the parallelism of the deblocking algorithm itself, an improved in-loop deblocking algorithm and process flow is jointly proposed by Intel and Microsoft for the upcoming AV1 codec standard, developed by the Alliance for Open Media (AOM). A description of the algorithm and evaluation of the quality impact of this algorithm is presented with respect to the current state of the art AV1 reference codec.

Full Text