Improved CNN-Based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

Luka Murn,Alan Smeaton,Saverio Blasi,Marta Mrak

doi:10.1109/ojsp.2021.3089439

Luka Murn, Alan Smeaton + Show 2 more

Open Access

https://doi.org/10.1109/ojsp.2021.3089439

Copy DOI

Journal: IEEE Open Journal of Signal Processing	Publication Date: Jan 1, 2021
Citations: 16	License type: CC BY-NC-ND 4.0

Affiliation: Dublin City University

Abstract

The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs.

Highlights

T O meet the increasing demands for video content at better qualities and higher resolutions, new video compression technology is being researched and developed
The training dataset for the proposed approach of fractional interpolation using Convolutional Neural Network (CNN) is created by encoding the video sequence BlowingBubbles within Versatile Video Coding (VVC) Test Model (VTM) 6.0 [24], under the Random Access (RA) configuration
If the input block is on the boundary of the frame, repetitive padding is applied to add the required pixels, a typical approach used in many MPEG standards [25]

Summary

Introduction

T O meet the increasing demands for video content at better qualities and higher resolutions, new video compression technology is being researched and developed. Deep Convolutional Neural Network (CNN) approaches have been successful in solving tasks such as single image super-resolution [2], image classification, object detection [3], and colourisation [4], some fundamental and challenging computer vision problems Such approaches are typically very complex, while video compression applications require a careful design of the coding tools, in order to meet the strict complexity requirements posed by practical encoder and decoder implementations. More advanced video coding solutions such as VVC may employ higher sub-pixel precision in certain circumstances, e.g. Affine motion [7] In addition to these approaches where filters were designed based on well known signals, new research has recently emerged where ML is used to derive or refine the sub-pixel samples. Several methods based on CNNs have been proposed that either improve the input to the interpolation process, by fusing the reference blocks in bi-prediction in a non-linear way [11], [12], or the output of the process, utilising the neighbouring reconstructed region to refine the prediction [13]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved CNN-Based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Signal Processing

Lead the way for us

Similar Papers

Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding.
Philipp Merkle ... Thomas Wiegand
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 33
Philipp Merkle, et. al.Philipp Merkle ... Thomas Wiegand
01 Jan 2024
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 33

Rate-Distortion Optimization-Based Learned Fractional Interpolation Filter Design for HEVC
Xiaofeng Huang ... Guoqing Xiang
IEEE Transactions on Broadcasting | VOL. 69
Xiaofeng Huang, et. al.Xiaofeng Huang ... Guoqing Xiang
01 Jun 2023
IEEE Transactions on Broadcasting | VOL. 69

Performance Comparison of Emerging EVC and VVC Video Coding Standards with HEVC and AV1
Dan Grois ... Kwang Pyo Choi
SMPTE Motion Imaging Journal | VOL. 130
Dan Grois, et. al.Dan Grois ... Kwang Pyo Choi
01 May 2021
SMPTE Motion Imaging Journal | VOL. 130

Interpreting Super-Resolution CNNs for Sub-Pixel Motion Compensation in Video Coding
Luka Murn ... Alan F Smeaton
-
Luka Murn, et. al.Luka Murn ... Alan F Smeaton
17 Oct 2021
17 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved CNN-Based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Signal Processing