Abstract

Novel view synthesis generates images from new views using multiple images of a scene in known views. Using wide-baseline stereo image pairs for novel view synthesis allows scenes to be rendered from varied perspectives with only two images, significantly reducing image acquisition and storage costs and improving 3D scene reconstruction efficiency. However, the large geometry difference and severe occlusion between a pair of wide-baseline stereo images often cause artifacts and holes in the novel view images. To address these issues, we propose a method that integrates both local and global information for synthesizing novel view images from wide-baseline stereo image pairs. Initially, our method aggregates cost volume with local information using Convolutional Neural Network (CNN) and employs Transformer to capture global features. This process optimizes disparity prediction for improving the depth prediction and reconstruction quality of 3D scene representations with wide-baseline stereo image pairs. Subsequently, our method uses CNN to capture local semantic information and Transformer to model long-range contextual dependencies, generating high-quality novel view images. Extensive experiments demonstrate that our method can effectively reduce artifacts and holes, thereby enhancing the synthesis quality of novel views from wide-baseline stereo image pairs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.