Abstract

Auditory spatial localization in humans is performed using a combination of interaural time differences, interaural level differences, as well as spectral cues provided by the geometry of the ear. To render spatialized sounds within a virtual reality (VR) headset, either individualized or generic Head Related Transfer Functions (HRTFs) are usually employed. The former require arduous calibrations, but enable accurate auditory source localization, which may lead to a heightened sense of presence within VR. The latter obviate the need for individualized calibrations, but result in less accurate auditory source localization. Previous research on auditory source localization in the real world suggests that our representation of acoustic space is highly plastic. In light of these findings, we investigated whether auditory source localization could be improved for users of generic HRTFs via cross-modal learning. The results show that pairing a dynamic auditory stimulus, with a spatio-temporally aligned visual counterpart, enabled users of generic HRTFs to improve subsequent auditory source localization. Exposure to the auditory stimulus alone or to asynchronous audiovisual stimuli did not improve auditory source localization. These findings have important implications for human perception as well as the development of VR systems as they indicate that generic HRTFs may be enough to enable good auditory source localization in VR.

Highlights

  • How we identify the source of sounds in space is determined largely by three acoustic cues: (a) interaural time differences (ITD), (b) interaural level differences (ILD), as well as (c) acoustic filtering i.e., spectral cues derived from the shape of one’s ears, head, and torso (Møller et al, 1995; Majdak et al, 2014)

  • Subsequent research has investigated auditory performance in response to altered ITDs using generic Head Related Transfer Functions (HRTFs). In these experiments the researchers found that the participants’ auditory localization performance improved following a series of training sessions repeated of 2–6 weeks (Shinn-cunningham et al, 1998). While these findings demonstrate improved localization performance following unimodal training, the long exposure periods required for only limited improvements, make this an impractical solution to improving the perceptual experience for casual users of generic HRTFs

  • In the experiments presented here, we have demonstrated that pairing a visual stimulus with an auditory source in virtual 3D space for a duration as short as 60 s is sufficient to induce a measurable improvement in auditory spatial localization in virtual reality (VR)

Read more

Summary

Introduction

How we identify the source of sounds in space is determined largely by three acoustic cues: (a) interaural time differences (ITD), (b) interaural level differences (ILD), as well as (c) acoustic filtering i.e., spectral cues derived from the shape of one’s ears, head, and torso (Møller et al, 1995; Majdak et al, 2014). Together, these cues provide us with a fairly accurate representation of acoustic space (Sabin et al, 2005).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call