In multi-view video systems, the decoded texture video and its corresponding depth video are utilized to synthesize virtual views from different perspectives using the depth-image-based rendering (DIBR) technology in 3D-high efficiency video coding (3D-HEVC). However, the distortion of the compressed multi-view video and the disocclusion problem in DIBR can easily cause obvious holes and cracks in the synthesized views, degrading the visual quality of the synthesized views. To address this problem, a novel two-stream re-parameterized refocusing hybrid attention (TRRHA) network is proposed to significantly improve the quality of synthesized views. Firstly, a global multi-scale residual information stream is applied to extract the global context information by using refocusing attention module (RAM), and the RAM can detect the contextual feature and adaptively learn channel and spatial attention feature to selectively focus on different areas. Secondly, a local feature pyramid attention information stream is used to fully capture complex local texture details by using re-parameterized refocusing attention module (RRAM). The RRAM can effectively capture multi-scale texture details with different receptive fields, and adaptively adjust channel and spatial weights to adapt to information transformation at different sizes and levels. Finally, an efficient feature fusion module is proposed to effectively fuse the extracted global and local information streams. Extensive experimental results show that the proposed TRRHA achieves significantly better performance than the state-of-the-art methods. The source code will be available at https://github.com/647-bei/TRRHA.
Read full abstract