Abstract

Mainstream image/video coding standards, exemplified by the state-of-the-art H.266/VVC, AVS3, and AV1, follow the block-based hybrid coding framework. Due to the block-based framework, encoders designed for these standards are easily optimized for peak signal-to-noise ratio (PSNR) but have difficulties optimizing for the metrics more aligned to perceptual quality, e.g., multi-scale structural similarity (MS-SSIM), since these metrics cannot be accurately evaluated at the small block level. We address this problem by leveraging inspiration from the end-to-end image compression built on deep networks, which is easily optimized through network training for any metric as long as the metric is differentiable. We compared the trained models using the same network structure but different metrics and observed that the models allocate rates in different ratios. We then propose a distillation method to obtain the rate allocation rule from end-to-end image compression models with different metrics and to utilize such a rule in the block-based encoders. We implement the proposed method on the VVC reference software—VTM and the AVS3 reference software—HPM, focusing on intraframe coding. Experimental results show that the proposed method on top of VTM achieves more than 10% BD-rate reduction than the anchor when evaluated with MS-SSIM or LPIPS, which leads to concrete perceptual quality improvement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call