Abstract

For the generation of highly natural synthetic speech, the control of prosody is of primary importance. The fundamental frequency (F0) is one of the most important components of speech prosody. This research investigates the variation of F0 in continuous Cantonese speech, with the goal of establishing an effective mechanism of prosody control in Cantonese text-to-speech (TTS) applications. Cantonese is a commonly used Chinese dialect that is well known for being rich in tones. This article describes a simple yet effective approach to the analysis and modeling of F0. The surface F0 contour of a continuous Cantonese utterance is considered to be the combination of a global component--phrase-level intonation curve, and local components--syllable-level tone contoursA novel method of F0 normalization is proposed to separate the local components from the global one. As a result, the variation in tone contours is greatly reduced. Statistical analysis is performed for the phrase curves and context-dependent tone contours that are extracted from a large corpus of 1,200 utterances. Specifically, the analysis is focused on co-articulated tone contours for disyllabic words, cross-word contours, and phrase-initial tone contours. Based on the results of the analysis, a template-based model for F0 generation is established and integrated with a Cantonese TTS system. Subjective listening tests show that the proposed model significantly improves the naturalness of the output speech.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.