Abstract

Problem statement: Tone distortion in Thai languages can deteriorate not only the intelligibility of speech but also its naturalness. Therefore, the correctness of tone must be carefully taken into account in continuous speech synthesis. The preliminary work confronted this problem when applying HMM-based speech synthesis to Thai. Approach: This study presented a study on speaker-dependent and speaker-independent Hidden Markov Model (HMM)-based Thai speech synthesis. In the speaker-dependent system, we developed a simple tone-separated tree structure in the tree-based context clustering process of the training stage to treat the tone distortion problem. In the speaker-independent system or averaged-voice-model system, a number of tonal features are extracted and applied with the Speaker Adaptive Training (SAT) and Shared Decision Tree (STC) techniques to release the tone distortion problem. Results: Our objective evaluation revealed that the proposed features could make the F0 contour closer to the target speaker’s real contour. The results from our subjective test also revealed that the proposed tonal features could improve the tone intelligibility of all speech-model scenarios of male and female. Conclusion: By applying our approach, the problem of tone distortion can be relieved effectively. The better tone correctness can improve the intelligibility and the naturalness of speech significantly.

Highlights

  • Thai speech synthesis has been widely developed in two approaches

  • An approach of Hidden Markov Model (HMM)-based Thai speech synthesis is presented in this study

  • The speaker-dependent system was implemented with high tone intelligibility when using a simple tone-separated tree context clustering

Read more

Summary

Introduction

Thai speech synthesis has been widely developed in two approaches. The first paper describing the development of a Thai TTS engine was published in 1983[3], where a speech unit concatenation algorithm was applied to Thai. This approach was implemented in the latest version of Vaja[4] at National Electronics and Computers Technology Center (NECTEC). To achieve various voice characteristics in speech synthesis systems based on this approach, a large amount of speech data is necessary. In order to treat this problem, an HMM-based speech synthesis which has been originally developed to support Japanese has been adapted for Thai by Chomphan in 2007

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.