Abstract

Work in audiovisual speech processing (AVSP) has established that the availability of visual speech signals can influence auditory perception by improving the intelligibility of speech in noise (Sumby and Pollack, 1954). However, exactly which aspects of visible signals are most responsible for this enhancement remains an open question, although convergent evidence along several lines suggests that visible information may reflect a common articulatory-acoustic temporal signature, and that the multi-modal availability of this temporal signature is at the root of this effect. We evaluated this hypothesis in a perceptual study using simple talking face animations whose motion is driven by a signal derived from the collective motion of perioral structures of an actual talker. We applied spatial and temporal manipulations to the structure of this driving signal using a biologically plausible model that preserves the smoothness of the manipulated trajectory, and tested whether these kinematic manipulations influenced the perception of linguistic prominence, an important component of the timing and rhythm (prosody) of speech. The data suggest that perceivers are sensitive to these manipulations, and that the cross-correlation between the acoustic amplitude envelope and the manipulated visible signal was a strong predictor of the perception of prominence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call