Contributions of the glottal source and vocal tract cues to emotional vowel perception in the valence-arousal space.

Yongwei Li,Junfeng Li,Masato Akagi

doi:10.1121/1.5051323

Abstract

Motivated by the source-filter model of speech production, analysis of emotional speech based on the inverse-filtering method has been extensively conducted. The relative contribution of the glottal source and vocal tract cues to perception of emotions in speech is still unclear, especially after removing the effects of the known dominant factors (e.g., F0, intensity, and duration). In this present study, the glottal source and vocal tract parameters were estimated in a simultaneous manner, modified in a controlled way and then used for resynthesizing emotional Japanese vowels by applying a recently developed analysis-by-synthesis method. The resynthesized emotional vowels were presented to native Japanese listeners with normal hearing for perceptually rating emotions in valence and arousal dimensions. Results showed that glottal source information played a dominant role in perception of emotions in vowels, while vocal tract information contributed to valence and arousal perceptions after neutralizing the effects of F0, intensity, and duration cues.

Full Text