Abstract

This paper explores the relationship between audio and video representations of speech, by examining how information in particular formants is complemented by facial information, for different classes of speech sounds. Specifically, we investigate whether acoustic information removed by deleting formant combinations that usually signal contrasts between vowels (F1,F2), stops (F2) and approximants (F3) can be replaced by optical information derived from the talking face. Simultaneous audio and video recordings were made of speakers uttering CVC nonsense syllables constructed from vowels, stops, and approximants exhibiting varying degrees of facial motion and varying dependence on individual formant cues. Synthetic stimuli were then resynthesized from combinations of one, two, and three formants using a parallel formant synthesizer. Subjects performed an auditory identification task with stimuli presented in an audio-only condition, followed by a separate block of stimuli presented in an audio-visual condition. Source effects were examined using an inverse-filtered source and a synthetic source with constant source characteristics. There were no differences between source conditions, but differences were obtained between the audio and audio+video conditions that reflect the relationship between facial and formant dynamics. These results complement recent research on perception of reduced video information and complete audio information [L. Lachs and D. B. Pisoni, J. Acoust. Soc. Am. 116, 507–518 (2004)]. [Work supported by NIH.]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.