Abstract

Visually generated spatial audio has been frequently explored in research on multimedia analysis. While these efforts have aimed at constructing congruent representation of recorded audiovisual experiences, it has yet to be directly applied to the rendering of virtual soundscapes. Moreover, few have dealt with scenarios in which audio information of captured physical environments is completely absent. This motivates the panoptic soundscape reconstruction and rendering technique presented in this work. The technique considers only captured panoramic visual environments with both embedded and annotated contextual metadata. The visual information is processed using existing pre-trained deep neural networks (DNNs) for semantic segmentation and object detection, with extracted information processed into sound objects. The embedded metadata is used to probabilistically analyze underlying contextual information related to the acoustic environments and generate predictions of its behaviors. Both sound objects and predictions are used as inputs for a spatial audio manager, which generates virtual sound sources and schedules them accordingly. The resulting soundscapes are rendered and experienced with room-centered virtual reality systems, such as Rensselaer’s Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVE-Lab), with potential adaptability to user-centered virtual reality devices. [Work supported by NSF No.1909229, Cognitive Immersive Systems Laboratory (CISL) and Army DURIP No. 68604-CS-RIP.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call