Abstract

In this paper, some approaches to the generation of synthesis units and prosodic information are proposed for Mandarin Chinese text-to-speech (TTS) conversion. The monosyllables are adopted as the basic synthesis units. A set of synthesis units is selected from a large continuous speech database based on two cost functions, which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a phrase. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results showed that the synthesized prosodic features matched quite well with their original counterparts. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call