The ability to capture the complementary information of stereo images is critical to the development of stereo image super-resolution. Most existing studies have attempted to integrate reliable stereo correspondence along the polar directions through parallax attention and various fusion strategies. However, most of these approaches ignore the large parallax differences in stereo images, resulting in poor performance of convolutional-based parallax attention in capturing the long-range dependencies between images. In this paper, we propose a novel cross-view guided stereo image super-resolution network (CVGSR) for reconstructing high-resolution stereo image pairs with rich texture details by fully exploiting the complementary nature of stereo image pairs. Specifically, we first deploy a cross-view interaction module (CVIM) to explore intra/cross-view dependencies from local to global to compensate for the incomplete compatibility of information between the left and right views. This module uses a progressive cross-guiding strategy to better merge features from occluded and non-occluded regions. Based on this, an efficient attention Transformer (EAT) is improved to activate more input information and further mine the cross-view complementarity. Furthermore, we design a texture loss to optimize the visual perceptual quality of reconstructed images with sharp boundaries and rich texture details. Extensive experiments on four stereo image datasets demonstrate that the proposed CVGSR achieves a competitive and excellent performance.
Read full abstract