Abstract
We present See360, which is a versatile and efficient framework for 360° panoramic view interpolation using latent space viewpoint estimation. Most of the existing view rendering approaches only focus on indoor or synthetic 3D environments and render new views of small objects. In contrast, we suggest to tackle camera-centered view synthesis as a 2D affine transformation without using point clouds or depth maps, which enables an effective 360° panoramic scene exploration. Given a pair of reference images, the See360 model learns to render novel views by a proposed novel Multi-Scale Affine Transformer (MSAT), enabling the coarse-to-fine feature rendering. We also propose a Conditional Latent space AutoEncoder (C-LAE) to achieve view interpolation at any arbitrary angle. To show the versatility of our method, we introduce four training datasets, namely UrbanCity360, Archinterior360, HungHom360 and Lab360, which are collected from indoor and outdoor environments for both real and synthetic rendering. Experimental results show that the proposed method is generic enough to achieve real-time rendering of arbitrary views for all four datasets. In addition, our See360 model can be applied to view synthesis in the wild: with only a short extra training time (approximately 10 mins), and is able to render unknown real-world scenes. The superior performance of See360 opens up a promising direction for camera-centered view rendering and 360° panoramic view interpolation.
Highlights
We present See360, which is a versatile and efficient framework for 360◦ panoramic view interpolation using latent space viewpoint estimation
Our method differs from novel view rendering since our goal is to capture the 3D structure of the surroundings rather than the structure of a single object
To render a novel view in a given camera pose, See360 extends traditional GANs by introducing a Conditional Latent space AutoEncoder (C-LAE) that maps the 3D camera pose to 2D image projection
Summary
We present See360, which is a versatile and efficient framework for 360◦ panoramic view interpolation using latent space viewpoint estimation. We can use RGBD cameras to capture depth for 3-DoF or 6-DoF rendering, enabling depth estimation [1], [2], semantic segmentation [3], [4], [5] and salience prediction [6], [7], [8] In contrast with both 360◦ video and novel view rendering, but bridging the gap between them, our goal is to achieve camera centered, 360◦ panoramic novel view interpolation. Ground truth d) Segmentation Comparison the very small differences are mainly located around edges, which indicates that the global information matches in the low frequency domain, despite of some high frequency information losses This good pixel fidelity enables to use the generated images for other applications, such as semantic segmentation (see Figure 1(d))
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have