Abstract
The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs.
Highlights
T O meet the increasing demands for video content at better qualities and higher resolutions, new video compression technology is being researched and developed
The training dataset for the proposed approach of fractional interpolation using Convolutional Neural Network (CNN) is created by encoding the video sequence BlowingBubbles within Versatile Video Coding (VVC) Test Model (VTM) 6.0 [24], under the Random Access (RA) configuration
If the input block is on the boundary of the frame, repetitive padding is applied to add the required pixels, a typical approach used in many MPEG standards [25]
Summary
T O meet the increasing demands for video content at better qualities and higher resolutions, new video compression technology is being researched and developed. Deep Convolutional Neural Network (CNN) approaches have been successful in solving tasks such as single image super-resolution [2], image classification, object detection [3], and colourisation [4], some fundamental and challenging computer vision problems Such approaches are typically very complex, while video compression applications require a careful design of the coding tools, in order to meet the strict complexity requirements posed by practical encoder and decoder implementations. More advanced video coding solutions such as VVC may employ higher sub-pixel precision in certain circumstances, e.g. Affine motion [7] In addition to these approaches where filters were designed based on well known signals, new research has recently emerged where ML is used to derive or refine the sub-pixel samples. Several methods based on CNNs have been proposed that either improve the input to the interpolation process, by fusing the reference blocks in bi-prediction in a non-linear way [11], [12], or the output of the process, utilising the neighbouring reconstructed region to refine the prediction [13]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.