Abstract

Problem statement: In HMM-based Thai speech synthesis, the tone degradation due to the imbalance of training data of all tones. Some distortion of syllable duration is obviously noticeable when the system is trained with a small amount of data. These problems cause the decrement in naturalness and intelligibility of the synthesized speech. Approach: This study proposes an approach to improve the correctness of tone of the synthesized speech which is generated by an HMM-based Thai speech synthesis system. In the tree-based context clustering process, tone groups and tone types are used to design four different structures of decision tree including a single binary tree structure, a simple tone-separated tree structure, a constancy-based-tone-separated tree structure and a trend-based-tone-separated tree structure. Results: A subjective evaluation of tone correctness is conducted by using tone perception of eight Thai listeners. The simple tone-separated tree structure gives the highest level of tone correctness, while the single binary tree structure gives the lowest level of tone correctness. The additional contextual tone information which is applied to all structures of the decision tree achieves a significant improvement of tone correctness. Finally, the evaluation of syllable duration distortion among the four structures shows that the constancy-based-tone-separated and the trend-based-tone-separated tree structures can alleviate the distortions that appear when using the simple tone-separated tree structure. Conclusion: The appropriate structure of tree in context clustering process with the additional contextual tone information can improve the correctness of tones, while the constancy-based-tone-separated and the trend-based-tone-separated tree structures can alleviate the syllable duration distortions.

Highlights

  • This study proposes some other structures of the decision tree designed for the purpose of maximal correctness of tone and the purpose of elimination of the syllable duration distortion

  • Speech database and training conditions: A set of phonetically balanced sentences of Thai speech (d) database named TSynC from National Electronics and Computers Technology Center was used for training the HMMs (Hansakunbuntheung et al, 2005)

  • In the static tone group of the constancy-based-toneseparated tree and the downward trend group of the trend-based-tone-separated tree, no tone-separations are applied because the data sharing among the tones phoneme labels included in TSynC and the utterance structure from ORCHID were used to construct the context dependent labels with 79 different phonemes including 65 phonemes from original Thai words, 12 within those groups is expected to treat the problem phonemes from some loan words and 2 phonemes of of syllable duration distortion

Read more

Summary

INTRODUCTION

In Thai, the HMM-based speech synthesis system syllable for tonal languages such as Thai, Mandarin and has been developed for years Tone must be carefully taken into account in account especially for the purpose of producing natural speech synthesis systems of tonal languages. In the sounding prosody of the tonal language It has been present day, HMM-based speech synthesis system is found that it can provide speech with the better becoming popular. Applied Sci., 9 (3): 313-320, 2012 when the system is trained with a small amount of data To treat this problem, this study proposes some other structures of the decision tree designed for the purpose of maximal correctness of tone and the purpose of elimination of the syllable duration distortion. The contextual tone information (tone types of the preceding and the succeeding syllables) has been applied to the designed decision-tree structures

MATERIALS AND METHODS
RESULTS
DISCUSSION
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.