Abstract

Virtual view synthesis from monocular video is challenging, as it aims to infer photorealistic views from single reference view. Previous work have achieved acceptable visual quality, however, are heavily relied on supervision information, such as depth or pristine virtual view, which are inadequate in practical application. In this paper, an unsupervised virtual view synthesis method is proposed to get rid of the supervision information. Firstly, it embed a spatiotemporal generative adversarial network into traditional depth-image-based rendering framework with no explicit depth information provided. Secondly, it utilized novel perceptual constraints without relying on pristine images, including the blind synthesized image quality metric and no-reference structure similarity. The entire framework is fully convolutional, producing hallucinated results in an end-to-end way. Particularly, the whole framework is independent of supervision information. Experimental results demonstrate that the proposed method produces pleasant virtual views in comparison with supervised methods, thereby can be beneficial to practical applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.