Fast Non-Local Adaptive In-Loop Filter Optimization on GPU

Chuanmin Jia,Xinfeng Zhang,Falei Luo,Shiqi Wang,Siwei Ma,Shanshe Wang

doi:10.1109/tmm.2020.2981185

Abstract

The non-local adaptive in-loop filter (NALF) for video coding has achieved significant coding gain by exploiting image non-local self-similarity (NSS) to efficiently reduce the compression artifacts. However, the intensive computation of NALF hinders its practical deployment in video standardizations. In this paper, we propose a fast NALF optimization algorithm in parallel-computing framework by leveraging the massive parallel execution resources of GPU. First, the computational complexity of original NALF is analyzed in depth, then the pipelines of computational-intensive modules are re-designed to adapt to the general-purpose GPU with more parallel-friendly consideration. Specifically, we speed up the NALF by optimizing thread allocation to maximize the parallelism degree and elaborately designing the GPU block dimension to avoid access conflict. The group-level and pixel-level parallelization for collaboratively filtering and patch matching modules are designed respectively. To reduce the cost in data transmission, the whole filtering process is implemented on GPU by taking the advantage of low data dependency in NALF. Extensive experimental results show that the proposed fast NALF optimization using GPU architecture achieves high-speeed processing while maintaining the significant coding performance of original NALF, which shows the potential of NALF in the future video coding standard.

Full Text