Abstract

Virtual view synthesis from monocular video is challenging, as it aims to infer photorealistic views from single reference view. Previous work have achieved acceptable visual quality, however, are heavily relied on supervision information, such as depth or pristine virtual view, which are inadequate in practical application. In this paper, an unsupervised virtual view synthesis method is proposed to get rid of the supervision information. Firstly, it embed a spatiotemporal generative adversarial network into traditional depth-image-based rendering framework with no explicit depth information provided. Secondly, it utilized novel perceptual constraints without relying on pristine images, including the blind synthesized image quality metric and no-reference structure similarity. The entire framework is fully convolutional, producing hallucinated results in an end-to-end way. Particularly, the whole framework is independent of supervision information. Experimental results demonstrate that the proposed method produces pleasant virtual views in comparison with supervised methods, thereby can be beneficial to practical applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call