Abstract
In 3D video systems, synthesized videos are typically rendered using view synthesis technology, mainly Depth Image Based Rendering (DIBR) technology, and suffer from both compression and 3D warping artifacts, which may degrade the perceptual quality of 3D video. Taking into account human perceptual characteristics towards synthesized views, wherein individuals readily discern DIBR distortion, such as cracks and irregular stretching, more attention should be paid to addressing DIBR distortion for Synthesized View Quality Enhancement (SVQE). In this paper, we propose a Distortion Map-guided Asymmetrical encoder–decoder restoration Network for SVQE, termed DMANet, which prioritizes human perceptual factors while maintaining a delicate balance between effectiveness and efficiency. Specifically, to consider the perceptual characteristics, a distortion-aware module is introduced by embedding the predicted DIBR distortion into the restoration network through multi-scale feature embedding, and collaborates with the DIBR distortion prediction loss to focus more on the DIBR-distorted regions. Meanwhile, to promote the efficiency of the U-shape network, an asymmetrical encoder–decoder restoration network is proposed, where the encoder progressively integrates both transformer and CNN modules for facilitating local–global feature extraction, while the decoder is configured with only the CNN module. Furthermore, hybrid transformer-based modules incorporate channel attention interaction and convolutional filters to fully exploit the channel-wise global modeling ability of self-attention while preserving local details. Substantial experimental results show that the proposed DMANet can outperform SOTA SVQE methods and is comparable to SOTA image restoration methods with fewer model parameters, flops, and shorter running time.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have