Abstract

This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to manipulate the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call