Temporal Context Mining for Learned Video Compression

Xihua Sheng,Dong Liu,Li Li,Jiahao Li,Bin Li,Yan Lu

doi:10.1109/tmm.2022.3220421

Abstract

Applying deep learning to video compression has attracted increasing attention in recent few years. In this work, we address end-to-end learned video compression with a special focus on better learning and utilizing temporal contexts. We propose to propagate not only the last reconstructed frame but also the feature before obtaining the reconstructed frame for temporal context mining. From the propagated feature, we learn multi-scale temporal contexts and re-fill the learned temporal contexts into the modules of our compression scheme, including the contextual encoder-decoder, the frame generator, and the temporal context encoder. We discard the parallelization-unfriendly auto-regressive entropy model to pursue a more practical encoding and decoding time. Experimental results show that our proposed scheme achieves a higher compression ratio than the existing learned video codecs. Our scheme also outperforms x264 and x265 (representing industrial software for H.264 and H.265, respectively) as well as the official reference software for H.264, H.265, and H.266 (JM, HM, and VTM, respectively). Specifically, when intra period is 32 and oriented to PSNR, our scheme outperforms H.265–HM by 14.4% bit rate saving; when oriented to MS-SSIM, our scheme outperforms H.266–VTM by 21.1% bit rate saving.

Full Text