Production and use of multimedia speech content for perceptual experiments in virtual environments

Philip W Robinson,Lindsey Kishline,Scott Colburn

doi:10.1121/1.5147756

Abstract

Consumer virtual reality devices have become inexpensive and readily available, and offer high quality motion tracking, low latency, and sufficient resolution to conduct ecologically valid perceptual experiments. Unfortunately, high quality multimedia source material can be difficult to produce or obtain due to the specialized equipment and facilities required for capture. To facilitate a wide array of perceptual experiments in multi-modal speech perception, we have generated a multimedia corpus that includes stereoscopic and spherical videos, as well as anechoic audio. These materials can easily be placed into an interactive virtual reality environment and delivered over a head-mounted display to evaluate ventriloquism, lip-reading, spatial release from masking, or other perceptual effects that depend on audio-visual integration. The corpus replicates the Coordinate Response Measure sentences, as well as the Harvard IEEE corpus word list. Subjects were recorded at three different distances to preserve accurate binocular disparities in the videos and can be positioned at arbitrary azimuthal positions. The corpus and a simple virtual reality application for positioning and viewing the videos on a head-mounted display have been made publicly available online for download. We will detail the methods used to produce this content, as well as use of the accompanying viewing application.

Full Text