Abstract

Recently, deep learning-based image compression has shown significant performance improvement in terms of coding efficiency and subjective quality. However, there has been relatively less effort on video compression based on deep neural networks. In this paper, we propose an end-to-end deep predictive video compression network, called DeepPVCnet, using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis with a multi-scale structure and a temporal-context-adaptive entropy model. Our DeepPVCnet jointly compresses motion information and residual data that are generated from the multi-scale structure via the feature transformation layers. Recent deep learning-based video compression methods were proposed in a limited compression environment using only P-frame or B-frame. Learned from the lesson of the conventional video codecs, we firstly incorporate a mode-selective framework into our DeepPVCnet with uni- and bi-directional predictive modes in a rate-distortion minimization sense. Also, we propose a temporal-context-adaptive entropy model that utilizes the temporal context information of the reference frames for the current frame coding. The autoregressive entropy models for CNN-based image and video compression is difficult to compute with parallel processing. On the other hand, our temporal-context-adaptive entropy model utilizes temporally coherent context from the reference frames, so that the context information can be computed in parallel, which is computationally and architecturally advantageous. Extensive experiments show that our DeepPVCnet outperforms AVC/H.264, HEVC/H.265 and state-of-the-art methods in an MS-SSIM perspective.

Highlights

  • Conventional video codecs such as AVC/H.264 [45], HEVC/H.265 [38] and VP9 [29] have shown significantly improved coding efficiencies, especially by enhancing their temporal prediction accuracies for the current frame to be encoded using its adjacent frames

  • There are three coding modes of frames used in video compression: I-frame mode that is compressed independently from its adjacent frames; P-frame mode that is compressed through the forward prediction using motion information; and B-frame mode that is compressed with bi-directional prediction for the current frame

  • We propose an end-to-end deep predictive video compression network, called DeepPVCnet, using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis with a multi-scale structure and a temporal-context-adaptive entropy model

Read more

Summary

Introduction

Conventional video codecs such as AVC/H.264 [45], HEVC/H.265 [38] and VP9 [29] have shown significantly improved coding efficiencies, especially by enhancing their temporal prediction accuracies for the current frame to be encoded using its adjacent frames. There are many recent studies on image compression using deep learning [5], [6], [16], [21], [23], [27], [35], [40]–[42] which often incorporate auto-encoder based end-to-end image compression architectures by attempting to improve compression performance. These works showed outperformed results of coding efficiency compared to the traditional image compression methods such as JPEG [43], JPEG2000 [37], and BPG [7].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call