Abstract

The non-local adaptive in-loop filter (NALF) for video coding has achieved significant coding gain by exploiting image non-local self-similarity (NSS) to efficiently reduce the compression artifacts. However, the intensive computation of NALF hinders its practical deployment in video standardizations. In this paper, we propose a fast NALF optimization algorithm in parallel-computing framework by leveraging the massive parallel execution resources of GPU. First, the computational complexity of original NALF is analyzed in depth, then the pipelines of computational-intensive modules are re-designed to adapt to the general-purpose GPU with more parallel-friendly consideration. Specifically, we speed up the NALF by optimizing thread allocation to maximize the parallelism degree and elaborately designing the GPU block dimension to avoid access conflict. The group-level and pixel-level parallelization for collaboratively filtering and patch matching modules are designed respectively. To reduce the cost in data transmission, the whole filtering process is implemented on GPU by taking the advantage of low data dependency in NALF. Extensive experimental results show that the proposed fast NALF optimization using GPU architecture achieves high-speeed processing while maintaining the significant coding performance of original NALF, which shows the potential of NALF in the future video coding standard.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.