Multi-layer Scene Representation from Composed Focal Stacks.

Reina Ishikawa,Hideo Saito,Shohei Mori,Denis Kalkofen

doi:10.1109/tvcg.2023.3320248

Abstract

Multi-layer images are a powerful scene representation for high-performance rendering in virtual/augmented reality (VR/AR). The major approach to generate such images is to use a deep neural network trained to encode colors and alpha values of depth certainty on each layer using registered multi-view images. A typical network is aimed at using a limited number of nearest views. Therefore, local noises in input images from a user-navigated camera deteriorate the final rendering quality and interfere with coherency over view transitions. We propose to use a focal stack composed of multi-view inputs to diminish such noises. We also provide theoretical analysis for ideal focal stacks to generate multi-layer images. Our results demonstrate the advantages of using focal stacks in coherent rendering, memory footprint, and AR-supported data capturing. We also show three applications of imaging for VR.

Full Text