Abstract

Expressive speech synthesis has recently received much attention. Stress (or pitch accent) is the perceptual prominence within words or utterances, and is one important feature in forming the highs and lows of the pitch contour, which makes the speech sounds more expressive. In this chapter, we introduce a largescale stress annotated continuous Mandarin corpus. Then the stress distribution and its stability are thoroughly analyzed from aspects of rhythm level and tone pattern. Based on these results, we propose a novel hierarchical Mandarin stress modeling method. The top level emphasizes stressed syllables, while the bottom level focuses on unstressed syllables for the first time due to its importance in both naturalness and expressiveness of synthetic speech. We also carried out several experiments to assign the Mandarin stress from textual features by using the classification and regression tree (CART) and maximum entropy (ME) model respectively. The work could be beneficial to speech synthesis systems for generating high natural and expressive speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call