CNN Filter for RPR-Based SR in VVC with Wavelet Decomposition

Hui Lan,Cheolkon Jung,Yang Liu,Ming Li

doi:10.1109/icassp49357.2023.10096013

Abstract

In this paper, we propose a convolutional neural network (CNN) filter for reference picture resampling (RPR)-based super-resolution (SR) with wavelet decomposition. The proposed CNN filter takes the low resolution (LR) reconstructed frame (Rec <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LR</inf> ), LR prediction frame (Pre <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LR</inf> ) and high resolution (HR) RPR upsampled frame (RPR <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">output</inf> ) as the input for RPR-based SR. Thus, the proposed CNN filter not only learns a mapping function between LR and HR images, but also effectively removes blocking artifacts in the reconstructed frame. We adopt wavelet decomposition to make RPR <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">output</inf> the same size as Rec <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LR</inf> and Pre <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LR</inf> as well as obtain the relationship between high frequency (HF) and low frequency (LF) components. To maximize feature reuse under the limited parameters, we design a residual spatial and channel attention block (RSCB) that combines residual blocks with spatial attention and channel attention to learn the weighted local information and global information in different receptive fields. Experimental results show that the proposed CNN filter achieves -8.98% and -4.05% BD-rate reductions on Y channel in AI and RA configurations over VTM-11.0_NNVC-2.0 anchor, respectively.

Full Text