Abstract
The paper proposes a meta-learning-based Mandarin-Tibetan cross-lingual text-to-speech (TTS) to realize both Mandarin and Tibetan speech synthesis under a unique framework. First, we build two kinds of Tacotron2-based Mandarin-Tibetan cross-lingual baseline TTS. One is a shared encoder Mandarin-Tibetan cross-lingual TTS, and another is a separate encoder Mandarin-Tibetan cross-lingual TTS. Both baseline TTS use the speaker classifier with a gradient reversal layer to disentangle speaker-specific information from the text encoder. At the same time, we design a prosody generator to extract prosodic information from sentences to explore syntactic and semantic information adequately. To further improve the synthesized speech quality of the Tacotron2-based Mandarin-Tibetan cross-lingual TTS, we propose a meta-learning-based Mandarin-Tibetan cross-lingual TTS. Based on the separate encoder Mandarin-Tibetan cross-lingual TTS, we use an additional dynamic network to predict the parameters of the language-dependent text encoder that could realize better cross-lingual knowledge sharing in the sequence-to-sequence TTS. Lastly, we synthesize Mandarin or Tibetan speech through the unique acoustic model. The baseline experimental results show that the separate encoder Mandarin-Tibetan cross-lingual TTS could handle the input of different languages better than the shared encoder Mandarin-Tibetan cross-lingual TTS. The experimental results further show that the proposed meta-learning-based Mandarin-Tibetan cross-lingual speech synthesis method could effectively improve the voice quality of synthesized speech in terms of naturalness and speaker similarity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.