Abstract

The paper reports on a cross-language study of how voice quality and f0 combine in the signaling of affect. Speakers of Irish-English and Japanese participated in perception tests. The stimuli consisted of a short utterance where f0 and voice source parameters were varied using the LF-model implementation of the KLSyn88a formant synthesizer, and were of three types: (1) VQ only involving voice quality variations and a neutral f0 contour; (2) f0 only, with different affect-related f0 contours and modal voice; (3) VQ+f0 stimuli, where the voice qualities of (1) combine with specific f0 contours from (2). Overall, stimuli involving voice quality variation were consistently associated with affect. In (2) only stimuli with high f0 yielded high affective ratings. Striking differences emerge between the ratings obtained from the two language groups. The results show that not only were some affects consistently perceived by one language group and not the other, but also that specific voice qualities and pitch contours were associated with very different affects across the two groups. The results have important implications for expressive speech synthesis, indicating that language/culture-specific differences need to be considered. [This work is supported by the EU-funded Network of Excellence on Emotion, HUMAINE.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call