Investigation of an Input Sequence on Thai Neural Sequence-to-Sequence Speech Synthesis

Pongsathon Janyoi,Ausdang Thangthai

doi:10.1109/o-cocosda202152914.2021.9660417

Abstract

This work aims to apply neural sequence-to-sequence speech synthesis to Thai TTS. Firstly, the most important contribution is on the investigation of the most appropriate unit of an input sequence for the neural sequence-to-sequence Thai TTS. We found that the system with phoneme input was superior to the system with character input. Secondly, We explored the benefits of the word and/or syllable boundaries information in both character-based and phoneme-based. We found that word delimiting can improve the naturalness of the synthesized speech. This means word segmentation is considered a necessary step for text processing tasks in neural sequence-to-sequence Thai TTS. Finally, we explored our TTS system in the real setting. Our results indicated that the neural TTS using ground truth or predicted input features for phoneme-based can generate a high-quality synthesized speech at the same level.

Full Text