We introduce a fully automated 360° video processing pipeline using a hierarchical combination of Artificial Intelligence (AI) modules to create immersive volumetric XR experiences. Two critical productions tasks (person segmentation and depth estimation) are addressed with a parallel Deep Neural Network (DNN) pipeline that combines instance segmentation, person detection, pose estimation, camera stabilization, neural tracking, 3D face detection, hair masking, and monocular 360° depth computation in a single and robust tool set. To facilitate the rapid uptake of these techniques we provide a detailed review of AI-based methods to address these problems (complete with links to recommended open source implementations) as well as references to existing authoring tools in the market. Our key contributions include a method to create semi-synthetic data sets for data auto-augmentation and using this technique to generate over 3.8 m images as part of a concise evaluation and subsequent retraining of DNNs for person detection tasks. Furthermore, we apply the same techniques to develop a spherical DNN for monocular depth estimation with a Free Viewpoint Video (FVV) capture system and a novel method to generate 3D human shapes and pose mannequins for training. To evaluate the performance of our AI authoring tool set we address four challenging production tasks and demonstrate the practical use of our solution with videos showing processed output.
Read full abstract