Abstract

The ventriloquism effect describes the phenomenon of audio and visual signals with common features, such as a voice and a talking face merging perceptually into one percept even if they are spatially misaligned. The boundaries of the fusion of spatially misaligned stimuli are of interest for the design of multimedia products to ensure a perceptually satisfactory product. They have mainly been studied using continuous judgment scales and forced-choice measurement methods. These results vary greatly between different studies. The current experiment aims to evaluate audio-visual fusion using reaction time (RT) measurements as an indirect method of measurement to overcome these great variances. A two-alternative forced-choice (2AFC) word recognition test was designed and tested with noise and multi-talker speech background distractors. Visual signals were presented centrally and audio signals were presented between 0° and 31° audio-visual offset in azimuth. RT data were analyzed separately for the underlying Simon effect and attentional effects. In the case of the attentional effects, three models were identified but no single model could explain the observed RTs for all participants so data were grouped and analyzed accordingly. The results show that significant differences in RTs are measured from 5° to 10° onwards for the Simon effect. The attentional effect varied at the same audio-visual offset for two out of the three defined participant groups. In contrast with the prior research, these results suggest that, even for speech signals, small audio-visual offsets influence spatial integration subconsciously.

Highlights

  • Audio-visual spatial perception has been studied for decades

  • These limits are of interest to the multimedia industry, working on immersive technologies, which aim at recreating surrounding sound scenes and 3D images in a realistic and convincing manner

  • The results suggest that spatial mismatches as small as 5◦ are processed in subconscious brain areas across the dorsal stream and lead to response-priming and possible changes in spatial awareness

Read more

Summary

Introduction

Audio-visual spatial perception has been studied for decades. It has been shown that spatially separated signals may be perceived at the same position, the so-called ventriloquism effect. Decorrelated pink noise at +10 dB signal-to-noise (SNR) ratio was played from five loudspeakers placed at 0◦, ±31◦, and ±110◦ as specified by the ITU-R (2012) throughout the test. This level was determined in a pre-test to provide approximately equal audio and visual error rates. Multi-talker speech was reproduced as background interference in experiment two. It was composed of eight competing speech signals, with two speech signals presented in each of four loudspeakers placed at ±31◦ and ±110◦. The overall level of the multi-talker speech signal was kept at +10 dB SNR compared to the target speech signal at 60 dB SPL

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.