Articles published on Inter Prediction
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
37 Search results
Sort by Recency
- Research Article
- 10.1109/access.2025.3587255
- Jan 1, 2025
- IEEE Access
- Minhun Lee + 1 more
Versatile Video Coding (VVC) standard adopts the geometric partition mode (GPM) to enhance inter prediction by supporting non-rectangular partitions in a block. The GPM improves coding efficiency by effectively handling arbitrary partitions caused by the motion of irregularly shaped objects in a scene. However, the blending process of GPM prediction suffers from limitations, primarily because the equal treatment given to blending ranges for predictors, disregarding the unique visual characteristics of diverse video content, leads to suboptimal performance. This study presents a comprehensive approach using adaptive and asymmetrical blending for the GPM (AAGPM) to enhance the performance of the blending technique by introducing two novel blending methods: a difference-based adaptive blending method and an implicitly asymmetrical blending mechanism. The difference-based method enhances the prediction accuracy by adaptively assigning more relevant blending weights to the dominant predictors. In addition, the asymmetrical mechanism adaptively adjusts the blending ranges without additional signaling by analyzing the local and global characteristics based on the reconstructed signal and predictors. These methods enable adaptive and content-aware adjustment of blending matrices, improve edge quality by preserving intrinsic edges, and prevent over-smoothing in complex sequences across diverse video content, including camera-captured, computer-generated, and mixed content. The proposed algorithm is implemented using the VVC reference software, VTM-18.0, and enhanced compression model (ECM)-6.0, respectively. Experimental results demonstrate that the algorithm achieves a Bjøntegaard Delta bitrate (BD-rate) savings of 0.86% and 0.50% for low-delay B and random-access configurations, respectively, compared to VVC. It also shows improved compression performance compared to ECM.
- Research Article
1
- 10.33851/jmis.2024.11.3.175
- Sep 30, 2024
- Journal of Multimedia Information System
- Qurat Ul Ain Aisha + 5 more
A sophisticated video surveillance system involves the problem of limited video storage to record video data for long time. Video compression technology is an effective solution to address this problem. Inspired by the success of neural network-based approaches in computer vision, research on neural network-based video coding has emerged. With the aim of achieving improved compression efficiency, an investigation on inter prediction plays a crucial role in neural network-based video coding. In this paper, we propose a convolutional neural network (CNN)-based generation and enhancement method for inter prediction (GEIP) in the Versatile Video Coding (VVC) standard. By leveraging fused features and self-attended features based on attention mechanism, the proposed method maximizes inter prediction performance. When compared with VTM-11.0 NNVC-1.0 anchor, it is verified that the BDrate reduction of the proposed method can be achieved up to 7.06% on Y component under random access (RA) configuration.
- Research Article
26
- 10.1109/tcsvt.2024.3360248
- Jul 1, 2024
- IEEE Transactions on Circuits and Systems for Video Technology
- Xihua Sheng + 3 more
Video compression performance is closely related to the accuracy of inter prediction. It tends to be difficult to obtain accurate inter prediction for the local video regions with inconsistent motion and occlusion. Traditional video coding standards propose various technologies to handle motion inconsistency and occlusion, such as recursive partitions, geometric partitions, and long-term references. However, existing learned video compression schemes focus on obtaining an overall minimized prediction error averaged over all regions while ignoring the motion inconsistency and occlusion in local regions. In this paper, we propose a spatial decomposition and temporal fusion based inter prediction for learned video compression. To handle motion inconsistency, we propose to decompose the video into structure and detail (SDD) components first. Then we perform SDDbased motion estimation and SDD-based temporal context mining for the structure and detail components to generate short-term temporal contexts. To handle occlusion, we propose to propagate long-term temporal contexts by recurrently accumulating the temporal information of each historical reference feature and fuse them with short-term temporal contexts. With the SDD-based motion model and long short-term temporal contexts fusion, our proposed learned video codec can obtain more accurate inter prediction. Comprehensive experimental results demonstrate that our codec outperforms the reference software of H.266/VVC on all common test datasets for both PSNR and MS-SSIM.
- Research Article
26
- 10.1109/tcsvt.2023.3299410
- May 1, 2024
- IEEE Transactions on Circuits and Systems for Video Technology
- Jianghao Jia + 6 more
In video coding, inter prediction aims to reduce temporal redundancy by using previously encoded frames as references. The quality of reference frames is crucial to the performance of inter prediction. This paper presents a deep reference frame generation method to optimize the inter prediction in Versatile Video Coding (VVC). Specifically, reconstructed frames are sent to a well-designed frame generation network to synthesize a picture similar to the current encoding frame. The synthesized picture serves as an additional reference frame inserted into the reference picture list (RPL) to provide a more reliable reference for subsequent motion estimation (ME) and motion compensation (MC). The frame generation network employs optical flow to predict motion precisely. Moreover, an optical flow reorganization strategy is proposed to enable bi-directional and uni-directional predictions with only a single network architecture. To reasonably apply our method to VVC, we further introduce a normative modification of the temporal motion vector prediction (TMVP). Integrated into the VVC reference software VTM-15.0, the deep reference frame generation method achieves coding efficiency improvements of 5.22%, 3.61%, and 3.83% for the Y component under random access (RA), low delay B (LDB), and low delay P (LDP) configurations, respectively. The proposed method has been discussed in Joint Video Exploration Team (JVET) meeting and is currently part of Exploration Experiments (EE) for further study.
- Research Article
10
- 10.1109/tip.2024.3446228
- Jan 1, 2024
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Philipp Merkle + 5 more
This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.
- Research Article
5
- 10.1007/s11042-023-17481-5
- Nov 6, 2023
- Multimedia Tools and Applications
- Rana Jassem + 3 more
Inter prediction multiple reference frames impact on H266-VVC encoder
- Research Article
48
- 10.1109/tcsvt.2022.3233221
- Jul 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
- Kai Lin + 5 more
Inter prediction is the critical component in hybrid coding framework to deal with the temporal redundancy. Most of the neural video coding methods typically follow the motion compensation based inter coding scheme, establishing motion vector (MV) as the central role. In this paper, we innovatively propose an efficient motion modeling approach by inherently decomposing it into two components, the intrinsic motion and the compensatory motion. The intrinsic motion originates from the implicit spatiotemporal context hidden in the historical sequence, which can be intuitively captured free of bits. On the top of it, the compensatory motion acts a role of structural refinement and texture enhancement as a form of side information. In particular, the inter prediction is performed in the feature space as a manner of progressive temporal transition, conditioned on the decomposed motion. By the motion decomposition paradigm, we innovatively answer the question of motion representation, compensation and coding in the learned video compression framework. With the temporal prediction, the remaining pixel residue is signaled to obtain the reconstruction. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art coding performance on par with other end-to-end coding methods, and outperforms versatile video coding (VVC) under low-delay P (LDP) configuration in terms of MS-SSIM metric.
- Research Article
5
- 10.3390/app13052795
- Feb 22, 2023
- Applied Sciences
- Yanhan Chu + 3 more
Inter prediction is a crucial part of hybrid video coding frameworks, and it is used to eliminate redundancy in adjacent frames and improve coding performance. During inter prediction, motion estimation is used to find the reference block that is most similar to the current block, and the following motion compensation is used to shift the reference block fractionally to obtain the prediction block. The closer the reference block is to the original block, the higher the coding efficiency is. To improve the quality of reference blocks, a quality enhancement network (RBENN) that is dedicated to reference blocks is proposed. The main body of the network consists of 10 residual modules, with two convolution layers for preprocessing and feature extraction. Each residual module consists of two convolutional layers, one ReLU activation, and a shortcut. The network uses the luma reference block as input before motion compensation, and the enhanced reference block is then filtered by the default fractional interpolation. Moreover, the proposed method can be used for both conventional motion compensation and affine motion compensation. Experimental results showed that RBENN could achieve a −1.35% BD rate on average under the low-delay P (LDP) configuration compared with the latest H.266/VVC.
- Research Article
9
- 10.56578/ataiml010202
- Dec 31, 2022
- Acadlore Transactions on AI and Machine Learning
- Helen K Joy + 1 more
Video compression gained its relevance with the boon of the internet, mobile phones, variable resolution acquisition device etc. The redundant information is explored in initial stages of compression that’s is prediction. Inter prediction that is prediction within the frame generates high computational complexity when working with traditional signal processing procedures. The paper proposes the design of a deep convolutional neural network model to perform inter prediction by crossing out the flaws in the traditional method. It briefs the modeling of network, mathematics behind each stage and evaluation of the proposed model with sample dataset. The video frame’s coding tree unit (CTU) of 64x64 is the input, the model converts and store it as a 16-element vector with the goodness of CNN network. It gives an overview of deep depth decision algorithm. The evaluation process shows that the model performs better for compression with less computational complexity.
- Research Article
27
- 10.1109/tcsvt.2021.3107135
- Jun 1, 2022
- IEEE Transactions on Circuits and Systems for Video Technology
- Dengchao Jin + 5 more
In video coding, it is a challenge to deal with scenes with complex motions, such as rotation and zooming. Although affine motion compensation (AMC) is employed in Versatile Video Coding (VVC), it is still difficult to handle non-translational motions due to the adopted hand-craft block-based motion compensation. In this paper, we propose a deep affine motion compensation network (DAMC-Net) for inter prediction in video coding to effectively improve the prediction accuracy. To the best of our knowledge, our work is the first attempt to deal with the deformable motion compensation based on CNN in VVC. Specifically, a deformable motion-compensated prediction (DMCP) module is proposed to compensate the current encoding block through a learnable way to estimate accurate motion fields. Meanwhile, the spatial neighboring information and the temporal reference block as well as the initial motion field are fully exploited. By effectively fusing the multi-channel feature maps from DMCP, an attention-based fusion and reconstruction (AFR) module is designed to reconstruct the output block. The proposed DAMC-Net is integrated into VVC and the experimental results demonstrate that the proposed method considerably enhances the coding performance.
- Research Article
56
- 10.1109/tcsvt.2021.3100744
- Oct 1, 2021
- IEEE Transactions on Circuits and Systems for Video Technology
- Haitao Yang + 7 more
Efficient representation and coding of fine-granular motion information is one of the key research areas for exploiting inter-frame correlation in video coding. Representative techniques towards this direction are affine motion compensation (AMC), decoder-side motion vector refinement (DMVR), and subblock-based temporal motion vector prediction (SbTMVP). Fine-granular motion information is derived at subblock level for all the three coding tools. In addition, the obtained inter prediction can be further refined by two optical flow-based coding tools, the bi-directional optical flow (BDOF) for bi-directional inter prediction and the prediction refinement with optical flow (PROF) exclusively used in combination with AMC. The aforementioned five coding tools have been extensively studied and finally adopted in the Versatile Video Coding (VVC) standard. This paper presents technical details of each tool and highlights the design elements with the consideration of typical hardware implementations. Following the common test conditions defined by Joint Video Experts Team (JVET) for the development of VVC, 5.7% bitrate reduction on average is achieved by the five tools. For test sequences characterized by large and complex motion, up to 13.4% bitrate reduction is observed. Additionally, visual quality improvement is demonstrated and analyzed.
- Research Article
4
- 10.24425/ijet.2021.135981
- May 25, 2021
- International Journal of Electronics and Telecommunications
- T Bernatin + 3 more
High definition video transmission is one of the prime demands of modern day communication. Changing needs demand diverse features to be offered by the video codec standards, H.264 fits to these requirements for video compression. In this work, an attempt has been made to optimize the inter prediction along with improved intra prediction to ensure the minimal bit rates thereby reduction in the channel bandwidth, which is required in most of the wireless applications. In intraprediction, only DC prediction mode is chosen out of 9 modes with 4*4 luma blocks that reduces the coding complexity towards optimal logic utilization in order to support typical FPGA board for hardware implementation. Most significantly, Inter prediction is carried out utilizing the M9K blocks efficiently with proper timing synchronization to reduce the latency in the encoding operation. Experimental set up comprising of two Altera DE2-115 boards connected through Ethernet cable demonstrated the video transmission. These optimized intra prediction and inter prediction stages resulted in significant improvement in the video compression possessing good subjective quality and increased video compression.
- Research Article
15
- 10.1109/tcsvt.2021.3063165
- Mar 3, 2021
- IEEE Transactions on Circuits and Systems for Video Technology
- Yang Wang + 4 more
Inter prediction is a crucial part of hybrid video coding frameworks, utilized to exploit the temporal redundancy in video sequences and improve the coding performance. During inter prediction, a predicted block is typically derived from reference pictures using motion estimation and motion compensation. To improve the coding performance of inter prediction, a neural network based enhancement to inter prediction (NNIP) is proposed in this paper. NNIP is composed of three networks, namely residue estimation network, combination network, and deep refinement network. Specifically, first, a residue estimation network is designed to estimate the residue between current block and its predicted block using their available spatial neighbors. Second, the feature maps of the estimated residue and the predicted block are extracted and concatenated in a combination network. Finally, the concatenated feature maps are fed into a deep refinement network to generate a refined residue, which is added back to the predicted block to derive a more accurate predicted block. NNIP is integrated in HEVC to evaluate its efficiency. The experimental results demonstrate that NNIP can achieve 4.6%, 3.0%, and 2.7% BD-rate reduction on average under LDP, LDB, and RA configurations compared to HEVC.
- Research Article
21
- 10.1109/ojsp.2021.3089439
- Jan 1, 2021
- IEEE Open Journal of Signal Processing
- Luka Murn + 3 more
The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs.
- Research Article
5
- 10.1109/access.2021.3077596
- Jan 1, 2021
- IEEE Access
- Kang-Ho Lee + 1 more
Because of resource-constrained environments, network compression has become an essential part of deep neural networks research. In this paper, we found a mutual relationship between kernel weights termed as Inter-Layer Kernel Correlation (ILKC). The kernel weights between two different convolution layers share a substantial similarity in shapes and values. Based on this relationship, we propose a new compression method, Inter-Layer Kernel Prediction (ILKP), which represents convolutional kernels with fewer bits through similarity between kernel weights in convolutional neural networks. Furthermore, to effectively adapt the inter prediction scheme from video coding technology, we integrate a linear transformation into the prediction scheme, which significantly enhances compression efficiency. The proposed method achieved 93.77% top-1 accuracy with $4.1\times $ compression ratio compared to the ResNet110 baseline model on CIFAR10. It means that 0.04% top-1 accuracy improvement was achieved by using less memory footprint. Moreover, incorporating quantization, the proposed method achieved a $13\times $ compression ratio with little performance degradation compared to the ResNets baseline model trained on CIFAR10 and CIFAR100.
- Research Article
33
- 10.1109/tmm.2019.2961504
- Jan 10, 2020
- IEEE Transactions on Multimedia
- Jiaying Liu + 2 more
Inter prediction is an important module in video coding for temporal redundancy removal, where similar reference blocks are searched from previously coded frames and employed to predict the block to be coded. Although traditional video codecs can estimate and compensate for block-level motions, their inter prediction performance is still heavily affected by the remaining inconsistent pixel-wise displacement caused by irregular rotation and deformation. In this paper, we address the problem by proposing a deep frame interpolation network to generate additional reference frames in coding scenarios. First, we summarize the previous adaptive convolutions used for frame interpolation and propose a factorized kernel convolutional network to improve the modeling capacity and simultaneously keep its compact form. Second, to better train this network, multi-domain hierarchical constraints are introduced to regularize the training of our factorized kernel convolutional network. For spatial domain, we use a gradually down-sampled and up-sampled auto-encoder to generate the factorized kernels for frame interpolation at different scales. For quality domain, considering the inconsistent quality of the input frames, the factorized kernel convolution is modulated with quality-related features to learn to exploit more information from high quality frames. For frequency domain, a sum of absolute transformed difference loss that performs frequency transformation is utilized to facilitate network optimization from the view of coding performance. With the well-designed frame interpolation network regularized by multi-domain hierarchical constraints, our method surpasses HEVC on average 6.1% BD-rate saving and up to 11.0% BD-rate saving for the luma component under the random access configuration.
- Research Article
1
- 10.32604/cmc.2020.011841
- Jan 1, 2020
- Computers, Materials & Continua
- Hongjin Zhu + 6 more
Studies show that encoding technologies in H.264/AVC, including prediction and conversion, are essential technologies. However, these technologies are more complicated than the MPEG-4, which is a standard method and widely adopted worldwide. Therefore, the amount of calculation in H.264/AVC is significantly up-regulated compared to that of the MPEG-4. In the present study, it is intended to simplify the computational expenses in the international standard compression coding system H.264/AVC for moving images. Inter prediction refers to the most feasible compression technology, taking up to 60% of the entire encoding. In this regard, prediction error and motion vector information are proposed to simplify the computation of inter predictive coding technology. In the initial frame, motion compensation is performed in all target modes and then basic information is collected and analyzed. After the initial frame, motion compensation is performed only in the middle 8×8 modes, and the basic information amount shifts. In order to evaluate the effectiveness of the proposed method and assess the motion image compression coding, four types of motion images, defined by the international telecommunication union (ITU), are employed. Based on the obtained results, it is concluded that the developed method is capable of simplifying the calculation, while it is slightly affected by the inferior image quality and the amount of information.
- Research Article
8
- 10.1109/access.2020.3013804
- Jan 1, 2020
- IEEE Access
- Buddhiprabha Erabadda + 3 more
The hierarchical quadtree partitioning of Coding Tree Units (CTU) is one of the striking features in HEVC that contributes towards its superior coding performance over its predecessors. However, the brute force evaluation of the quadtree hierarchy using the Rate-Distortion (RD) optimisation, to determine the best partitioning structure for a given content, makes it one of the most time-consuming operations in HEVC encoding. In this context, this paper proposes an intelligent fast Coding Unit (CU) size selection algorithm to expedite the encoding process of HEVC inter-prediction. The proposed algorithm introduces (i) two CU split likelihood modelling and classification approaches using Support Vector Machines (SVM) and Bayesian probabilistic models, and (ii) a fast CU selection algorithm that makes use of both offline trained SVMs and online trained Bayesian probabilistic models. Finally, (iii) a computational complexity to coding efficiency trade-off mechanism is introduced to flexibly control the algorithm to suit different encoding requirements. The experimental results of the proposed algorithm demonstrate an average encoding time reduction performance of 53.46%, 61.15%, and 58.15% for Low Delay B, Random Access, and Low Delay P configurations, respectively, with Bjøntegaard Delta-Bit Rate (BD-BR) losses of 2.35%, 2.9%, and 2.35%, respectively, when evaluated across a wide range of content types and quality levels.
- Research Article
1
- 10.1007/s11036-018-1174-0
- Dec 18, 2018
- Mobile Networks and Applications
- Tengrui Shi + 1 more
HEVC has made significant progress in coding efficiency. Compared with the previous generation H.264/AVC video coding standard, HEVC can save 50% of code rate under the same visual quality. However, this drop in bit rate is accompanied by an increase in computational complexity, which poses a great challenge to the application and development of HEVC. After analyzing the great correlation between the partitioning mode of coding unit in the encoder and several parameters, the partitioning problem of the CU is modeled as a classification problem in data mining, and it is decided to use a decision tree to solve this classification problem. We implement the decision tree on the official software HM16.2 and test the algorithm on the test set. Experiments show that fast algorithm based on decision tree can improve the encoding speed effectively with little influence on the encoding performance.
- Research Article
9
- 10.1109/tcsvt.2017.2769884
- Oct 12, 2018
- IEEE Transactions on Circuits and Systems for Video Technology
- Changyue Ma + 5 more
Inter prediction is a fundamental technology in video coding to remove the temporal redundancy between video frames. Traditionally, the reconstructed frames are directly put into a reference frame buffer to serve as references for inter prediction. Using multiple reference frames increases the accuracy of inter prediction, but also incurs a waste of memory of the buffer since the content of reference frames is highly similar. To address this problem, we propose to organize the references at clip level in addition to frame level, i.e. the reference buffer stores not only reference frames, but also reference clips that are cropped regions selected from the reconstructed frames. Using clip-level references, we can manage the reference content more economically, since the content of multiple reference frames is divided into the singular content of each frame as well as the repetitive content that appears in multiple frames. For the repetitive content, only one copy is stored in reference clips so as to avoid duplicate. Moreover, using reference clips also facilitates the bit-rate allocation among reference content, i.e. the quality of each clip can be decided adaptively to achieve the rate-distortion optimization. In this paper, we propose a complete video coding framework using reference clips, and investigate the problems including how to generate reference clips as either singular content clips or repetitive content clips, how to manage the clips, how to utilize the clips for inter prediction, and how to allocate bit-rate among clips, in a systematic manner. The proposed video coding framework is implemented upon the state-of-the-art video coding scheme, High Efficiency Video Coding (HEVC). Experimental results show that our scheme achieves on average 5.1% and 5.0% BD-rate reduction than the HEVC anchor, in low-delay B and low-delay P settings, respectively. We believe that reference clip opens up a new dimension for optimizing inter prediction in video coding, and thus is worthy of further study.