Abstract

This work presents a study of Mandarin speech focusing on consistency analysis of the spectrum and prosody within syllables. Identified as a result of inspection of the human pronunciation process, this consistency can be interpreted as a high correlation between the warping curves of the spectrum and the prosody intra a syllable. The consistency analysis consisted of three steps. First, the hidden Markov model algorithm was used to decode the hidden Markov model-state sequences within a syllable, while at the same time dividing them into three segments. Second, based on a designated syllable, the vector quantization (VQ) with the Linde–Buzo–Gray algorithm was employed to train the VQ codebooks of the prosodic vector of each segment. Third, the prosodic vector of each segment was encoded as an index using the VQ codebooks, and then, to analyze the consistency, the probability of each possible path was evaluated as a prerequisite. Finally, two syllables were used as examples to verify the consistency property found in the experiments. It is demonstrated experimentally that there is definitely consistency in the case where the syllable is located in exactly the same word. These results offer a research direction in that the warping process between the spectrum and the prosody intra a syllable must be considered in text-to-speech systems to improve the synthesized speech quality. Copyright © 2013 John Wiley & Sons, Ltd.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call