Abstract

We rarely become familiar with the voice of another person in isolation but usually also have access to visual identity information, thus learning to recognize their voice and face in parallel. There are conflicting findings as to whether learning to recognize voices in audiovisual vs audio-only settings is advantageous or detrimental to learning. One prominent finding shows that the presence of a face overshadows the voice, hindering voice identity learning by capturing listeners' attention (Face Overshadowing Effect; FOE). In the current study, we tested the proposal that the effect of audiovisual training on voice identity learning is driven by attentional processes. Participants learned to recognize voices through either audio-only training (Audio-Only) or through three versions of audiovisual training, where a face was presented alongside the voices. During audiovisual training, the faces were either looking at the camera (Direct Gaze), were looking to the side (Averted Gaze) or had closed eyes (No Gaze). We found a graded effect of gaze on voice identity learning: Voice identity recognition was most accurate after audio-only training and least accurate after audiovisual training including direct gaze, constituting a FOE. While effect sizes were overall small, the magnitude of FOE was halved for the Averted and No Gaze conditions. With direct gaze being associated with increased attention capture compared to averted or no gaze, the current findings suggest that incidental attention capture at least partially underpins the FOE. We discuss these findings in light of visual dominance effects and the relative informativeness of faces vs voices for identity perception.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call