NN-Based In-Loop Filtering With Inputs Transformed
The state-of-the-art neural network-based (NN-based) in-loop filters for video coding are built on convolutional neural networks. The Joint Video Experts Team (JVET) activities investigate NN-based in-loop filters for two operation points, the high operation point (HOP) which provides highest possible gains at a high complexity and the low operation point (LOP) which is constrained on a low complexity. This paper focuses on the LOP network. We apply a DCT and reshaping to the inputs and an inverse DCT and inverse reshaping to the outputs of LOP. The spatial resolution inside the network is reduced by a factor of four while the final output still has the same number of pixels. The complexity in MAC/pixel (multiplyaccumulate operations per pixel) is therefore also reduced by a factor of four. This freed-up complexity is instead spent on increasing the number of backbone blocks and channels so the LOP complexity is matched. Our network has a complexity of $16.9 \mathrm{kMAC} /$ pixel and 0.2 M parameters (LOP: 17 kMAC/pixel, 0.05 M parameters). The BD-rate impact compared to the NNVC-7.1 anchor is reported to be −0.48% for RA and −0.17% for AI with the float model, and −0.44% for RA and −0.18% for AI with the integer model.