Abstract

A parameter mapping model to convert neutral speech to sad speech was proposed in this paper by comparison of statistical parameters between neutral and sad speech sample pairs with the same text content. In this model, we found sad speech with generally lower fundamental frequency than neutral speech, and the F0 contour is more stable than neutral speech; while the formants of sad speech is slightly higher than neutral speech. When concerning rhythm, the speed of sad speech is slightly slower than neutral speech. There exists significant difference between voiced segments and voiceless segments. Voiceless segments are significantly longer in sad speech. Speech conversion from neutral to sad was realized using this model, and got good results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call