Abstract

Traditional block-based spatially scalable video coding has been studied for over twenty years. While significant advancements have been made, the scope for further improvement in compression performance is limited. Inspired by the success of learned video coding, in this paper, we propose an end-to-end learned spatially scalable video coding scheme-LSSVC, which provides a new solution for scalable video coding. In LSSVC, we propose to use the motion, texture, and latent information of the base layer (BL) as interlayer information. To reduce interlayer redundancy, we design three modules to leverage the upsampled interlayer information. Firstly, we design a contextual motion vector (MV) encoder-decoder. It utilizes the upsampled BL motion information as enhancement information to help compress high-resolution (HR) MV. Secondly, we design a hybrid temporal-layer context mining module to learn more accurate contexts from the EL temporal features and the upsampled BL texture information. Thirdly, we use the upsampled BL latent information as an interlayer prior for the entropy model to estimate more accurate HR latent probability distribution parameters. Experimental results show that our scheme surpasses H.265/SHVC reference software by a large margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call