Abstract

In recent years, globalization has highlighted the importance of having machines that can truly provide customized communication for different languages. Majority of the research in the field focus on developing technologies for widely used languages such as English. In this study, we apply HMM-based speech synthesis (HTS) technology for Indonesian language. The proposed hybrid HTS-based framework, PFHTS-IDSS, uses phoneme and full-context lab to synthesize Indonesian with higher accuracy. First, we identify a list of Indonesian phonemes according to the initial-final structure of Chinese language. Based on this, we add zero-initials that match the Indonesian acoustic performance and HTS, which can make the synthesized speech natural and smooth. Second, we consider Indonesian phonemes as synthetic units to synthesize speech through the triphone and full-context lab. In addition, we design context properties of the full-context lab and the corresponding question set to train the acoustic model, which can eliminate machine sounds. Experimental results suggest that the accuracy of phoneme segmentation (PSA) and the naturalness of speech synthesis (SSN) are significantly improved via PFHTS-IDSS. Especially, the PSA of selecting phonemes as synthetic units reaches 88.3% and the corresponding SSN based on full-context lab is 4.1. The results demonstrated by PFHTS-IDSS presented in this paper may be used in multilingual free interactive system to promote better communication in terms of voice navigation, intelligent speaker and question-answering system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call