Abstract

Learned video compression has drawn great attention and shown promising compression performance recently. In this article, we focus on the two components in the learned video compression framework, the conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion, and residual compression, which introduces a temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. In addition, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce the bitrate cost of residual coding. The module reuses decoded optical flow as a motion prior and utilizes deformable convolution to mine high-quality information from the reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual coding–based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in the low-delay scenario compared with recent learning-based methods and traditional H.265/HEVC in terms of Peak Signal-to-Noise Ratio (PSNR) and Multi-Scale Structural Similarity Index (MS-SSIM). The code is available at OpenLVC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call