Research on Tibetan Speech Synthesis Based on Fastspeech2

Ba Zu,Zhaxi Pengmao,Rangzhuoma Cai,Zhijie Cai

doi:10.1109/prml56267.2022.9882187

Abstract

As the core technology of human-computer interaction, speech synthesis plays an important role in education and life, science technology. Especially as a research hotspot in the field of artificial intelligence, speech synthesis has not only achieved extraordinary results in Mandarin but also in minority languages such as Tibetan got good results. At present, Tibetan speech synthesis research is mainly based on autoregressive models, which are far superior to traditional models and can synthesize high-quality speech. However, due to the slow inference speed of the autoregressive model and the implicit features of the speech duration alignment, pitch, and energy of the acoustic model, there are problems such as slow synthesis speed, repeated words or word skipping, and the inability to control speech rate and prosody in a fine-grained manner. In response to the above problems, this paper studies Tibetan text-to-speech alignment and Tibetan speech synthesis based on a combination of a non-autoregressive acoustic model and vocoder. First, Tibetan speech and phoneme alignment are performed based on the Hidden Markov Gaussian Mixture alignment model. Secondly, the phoneme duration of real speech combined with variable information such as pitch and energy is introduced into the Fastspeech2 acoustic model, and the Variance Adapter is used to solve the one-to-many problem of traditional speech synthesis, reducing word skipping and repetition. Finally, to take into account both synthesis speed and synthesis quality, a pre-trained HiFi-GAN vocoder is used to convert the mel spectrum to speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Research on Tibetan Speech Synthesis Based on Fastspeech2

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
Weizhao Zhang ... Lili Wang
IEEE Access | VOL. 7
Weizhao Zhang, et. al.Weizhao Zhang ... Lili Wang
01 Jan 2019
IEEE Access | VOL. 7

Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
Weizhao Zhang ... Hongwu Yang
Applied Sciences | VOL. 12
Weizhao Zhang, et. al.Weizhao Zhang ... Hongwu Yang
28 Nov 2022
Applied Sciences | VOL. 12

A DNN-based Mandarin-Tibetan cross-lingual speech synthesis
Weitong Guo ... Zhenye Gan
-
Weitong Guo, et. al.Weitong Guo ... Zhenye Gan
01 Nov 2018
01 Nov 2018

Improving Sequence-to-sequence Tibetan Speech Synthesis with Prosodic Information
Weizhao Zhang ... Hongwu Yang
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Weizhao Zhang, et. al.Weizhao Zhang ... Hongwu Yang
22 Sep 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on Tibetan Speech Synthesis Based on Fastspeech2

Abstract

Talk to us

Similar Papers