Consonant Perception in Connected Syllables Spoken at a Conversational Syllabic Rate.

Sandeep A Phatak,Ken W Grant,Danielle J Zion

doi:10.1177/23312165231156673

Abstract

Closed-set consonant identification, measured using nonsense syllables, has been commonly used to investigate the encoding of speech cues in the human auditory system. Such tasks also evaluate the robustness of speech cues to masking from background noise and their impact on auditory-visual speech integration. However, extending the results of these studies to everyday speech communication has been a major challenge due to acoustic, phonological, lexical, contextual, and visual speech cue differences between consonants in isolated syllables and in conversational speech. In an attempt to isolate and address some of these differences, recognition of consonants spoken in multisyllabic nonsense phrases (e.g., aBaSHaGa spoken as /ɑbɑʃɑɡɑ/) produced at an approximately conversational syllabic rate was measured and compared with consonant recognition using Vowel-Consonant-Vowel bisyllables spoken in isolation. After accounting for differences in stimulus audibility using the Speech Intelligibility Index, consonants spoken in sequence at a conversational syllabic rate were found to be more difficult to recognize than those produced in isolated bisyllables. Specifically, place- and manner-of-articulation information was transmitted better in isolated nonsense syllables than for multisyllabic phrases. The contribution of visual speech cues to place-of-articulation information was also lower for consonants spoken in sequence at a conversational syllabic rate. These data imply that auditory-visual benefit based on models of feature complementarity from isolated syllable productions may over-estimate real-world benefit of integrating auditory and visual speech cues.

Full Text