Abstract

Traditional block-based spatially scalable video coding has been studied for over twenty years. While significant advancements have been made, the scope for further improvement in compression performance is limited. Inspired by the success of learned video coding, we propose an end-to-end learned spatially scalable video coding scheme, LSSVC, which provides a new solution for scalable video coding. In LSSVC, we propose to use the motion, texture, and latent information of the base layer (BL) as interlayer information for compressing the enhancement layer (EL). To reduce interlayer redundancy, we design three modules to leverage the upsampled interlayer information. Firstly, we design a contextual motion vector (MV) encoder-decoder, which utilizes the upsampled BL motion information to help compress high-resolution MV. Secondly, we design a hybrid temporal-layer context mining module to learn more accurate contexts from the EL temporal features and the upsampled BL texture information. Thirdly, we use the upsampled BL latent information as an interlayer prior for the entropy model to estimate more accurate probability distribution parameters for the high-resolution latents. Experimental results show that our scheme surpasses H.265/SHVC reference software by a large margin. Our code is available at https://github.com/EsakaK/LSSVC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.