Speech Synthesis of Tibetan Amdo Dialect Based on Attention and Recurrent Neural Network

Zhenye Gan,Zhimeng Song,Min Luo,Min Zhang

doi:10.1109/icise-ie53922.2021.00045

Abstract

This paper proposes a speech synthesis method of Tibetan Amdo Dialect based on an attention mechanism and Recurrent Neural Network(RNN). This paper mainly introduces the Sequence to Sequence structure with an attention mechanism. The advantages of this structure are it can receive the input of characters, and output the corresponding original spectrum- diagram, then finally directly generate speech by using the vocoder algorithm. In order to solve the problem of information overload and improve the efficiency and accuracy of task processing, an attention module is added between the encoder and the decoder to learn the alignment information from text sequence to Mel spectrum sequence, this can make full use of the information carried by the input sequence. Using the above structure and the synthetic corpus of Tibetan Amdo Dialect, we realize and explore the speech synthesis of syllables and phonemes in Tibetan Amdo Dialect. The experimental results show that the speech quality synthesized by this method is further improved compared with the traditional speech synthesis method.

Full Text