Audio-visual clear speech: Articulation, acoustics and perception of segments and tones

Yue Wang,Joan Sereno,Allard Jongman

doi:10.1121/10.0018372

Abstract

Research has established that clear speech with enhanced acoustic signal benefits segmental intelligibility. Less attention has been paid to visible articulatory correlates of clear-speech modifications, or to clear-speech effects at the suprasegmental level (e.g., lexical tone). Questions thus arise as to the extent to which clear-speech cues are beneficial in different input modalities and linguistic domains, and how different resources are incorporated. These questions address the fundamental argument in clear-speech research with respect to the trade-off between effects of signal-based phoneme-extrinsic modifications to strengthen overall acoustic salience versus code-based phoneme-specific modifications to maintain phonemic distinctions. In this talk, we report findings from our studies on audio-visual clear speech production and perception, including vowels and fricatives differing in auditory and visual saliency, and lexical tones believed to lack visual distinctiveness. In a 3-stream study, we use computer-vision techniques to extract visible facial cues associated with segmental and tonal productions in plain and clear speech, characterize distinctive acoustic features across speech styles, and compare audio-visual plain and clear speech perception. Findings are discussed in terms of how speakers and perceivers strike a balance between utilizing general saliency-enhancing and category-specific cues across audio-visual modalities and speech styles with the aim of improving intelligibility.

Full Text