The performance in restoring compressed multi-view video (MVV) of the existing learning-based methods is limited because they only utilize information of temporally adjacent frames or parallax neighboring views. However, the compression artifacts caused by multi-view coding (MVC) may be related to the reference errors of intra-frame, inter-frame, and inter-view. In this paper, with delicately utilizing the stereo information from both temporal and parallax domains, a motion-parallax complementation network (MPCNet) is proposed to restore the quality of compressed MVV more efficiently. First, we introduce a motion-parallax complementation strategy consisting of a coarse stage and a fine stage. By mutually compensating the feature extracted from multiple domains, useful multi-frame information can be efficiently preserved and aggregated step by step. Second, an attention-based feature filtering and modulation module (AFFM) is proposed, which provides an efficient fusion method for two features by suppressing misleading information. By deploying it in most submodules of the proposed approach, the representational ability of MPCNet can be improved, resulting in a more substantial restoration performance. Experimental results prove the effectiveness of MPCNet by an average increase of 1.978 dB in PSNR, and 0.0282 in MS-SSIM. The BD-rate reduction can reach 47.342% on average. The subjective quality is greatly improved and lots of compression distortions are eliminated. Meanwhile, this work also benefits the accuracy improvement for high-level vision tasks, e.g., mIoU of semantic segmentation and mAP of object detection achieve 0.352 and 51.71, respectively. Quantitative and qualitative analyses demonstrate that MPCNet outperforms state-of-the-art approaches.
Read full abstract