Abstract

Recent telepresence systems have shown significant improvements in quality compared to prior systems. However, they struggle to achieve both low cost and high quality at the same time. In this work, we envision a future where telepresence systems become a commodity and can be installed on typical desktops. To this end, we present a high-quality view synthesis method that uses a cost-effective capture system that consists of commodity hardware accessible to the general public. We propose a neural renderer that uses a few RGBD cameras as input to synthesize novel views of a user and their surroundings. At the core of the renderer is Multi-Layer Point Cloud (MPC), a novel 3D representation that improves reconstruction accuracy by removing non-linear biases in depth cameras. Our temporally-aware renderer further improves the stability of synthesized videos by conditioning on past information. Additionally, we propose Spatial Skip Connections (SSC) to improve image upsampling under limited GPU memory. Experimental results show that our renderer outperforms recent methods in terms of view synthesis quality. Our method generalizes to new users and challenging content (e.g., hand gestures and clothing deformation) without costly per-video optimization, object templates, or heavy pre-processing. The code and dataset will be made available.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.