Advanced Scene Sensing for Virtual Teleconference

Ivan Minárik,Marek Vančo,Gregor Rozinaj

doi:10.1007/978-3-030-96878-6_18

Abstract

AbstractOver the last year the need for video conferences has risen significantly due to the ongoing global pandemic. The goal of this project is to improve user experience from having access to only voice and plain 2D image by adding a third spatial dimension, creating a more immersive setting. Azure Kinect Development Kit utilizes multiple cameras, namely the RGB camera and depth camera. Depth camera is based on ToF principle, which uses near-IR to cast modulated illumination onto the scene. The setup uses multiple Azure Kinect devices in sync and offset in space to obtain non-static 3D capture of a person. Unity engine with Azure Kinect SDK is used to process the data gathered by all devices. Firstly, a depth spatial map is created by combining overlaid outputs from each device. Secondly, RGB pixels are mapped onto depth spatial points to provide a final texture to the 3D model. Taking into account the need to export a continuous capture of raw data to a server, body tracking and image processing algorithms are used. Finally, the processed data can be exported and utilized in AR, VR or any other 3D capable interface. This 3D projection aims to enhance sensory experience by utilising non-verbal communication along with classical speech in video conferences.Keywords3D sensingVolumetric alignmentPoint cloud

Full Text