Abstract

This work aims to apply neural sequence-to-sequence speech synthesis to Thai TTS. Firstly, the most important contribution is on the investigation of the most appropriate unit of an input sequence for the neural sequence-to-sequence Thai TTS. We found that the system with phoneme input was superior to the system with character input. Secondly, We explored the benefits of the word and/or syllable boundaries information in both character-based and phoneme-based. We found that word delimiting can improve the naturalness of the synthesized speech. This means word segmentation is considered a necessary step for text processing tasks in neural sequence-to-sequence Thai TTS. Finally, we explored our TTS system in the real setting. Our results indicated that the neural TTS using ground truth or predicted input features for phoneme-based can generate a high-quality synthesized speech at the same level.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call