‘Seeing tunes.’ The role of visual gestures in tune interpretation

Joan Borràs-Comes,Pilar Prieto

doi:10.1515/labphon.2011.013

Joan Borràs-Comes, Pilar Prieto

Open Access

PDF Available

https://doi.org/10.1515/labphon.2011.013

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractOne of the unresolved questions in audiovisual prosody is the relative contribution of acoustic and visual cues to the expression of prosodic meaning. Though the majority of studies on audiovisual prosody have found a complementary mode of processing whereby sight provides relatively weak and redundant information in comparison with strong auditory cues, other work has found that sight provides information more efficiently than hearing. In Catalan, a pitch range contrast in a rising-falling nuclear configuration conveys a difference between a contrastive focus statement and an echo question. The main goal of this study is to investigate the relative contribution of visual cues in conveying this distinction. Twenty native speakers of Central Catalan participated in two identification tasks in which they had to decide between a focus statement and a question interpretation. Experiment 1 used a pitch range auditory continuum combined with two congruent and incongruent videotapes showing the facial gestures that are characteristic of the two pragmatic meanings. Experiment 2 used the same auditory continuum in combination with another continuum for facial gestures produced using a digital image-morphing technique. The responses and reaction times obtained in both experiments revealed a consistent reliance on visual cues in the listener's decisions, but also a consistent effect of the auditory stimulus. We argue that although facial gestures are the most influential elements that Catalan listeners rely on to decide between contrastive focus and echo question interpretations, bimodal integration with the acoustic cues is necessary for perceptual processing to be accurate and fast. Finally, we discuss the implications of these results for models of audiovisual processing.

Full Text