Tibetan Speech Synthesis Based on Pre-Traind Mixture Alignment FastSpeech2

Qing Zhou,Xiaona Xu,Yue Zhao

doi:10.3390/app14156834

Abstract

Most current research in Tibetan speech synthesis relies primarily on autoregressive models in deep learning. However, these models face challenges such as slow inference, skipped readings, and repetitions. To overcome these issues, we propose an enhanced non-autoregressive acoustic model combined with a vocoder for Tibetan speech synthesis. Specifically, we introduce the mixture alignment FastSpeech2 method to correct errors caused by hard alignment in the original FastSpeech2 method. This new method employs soft alignment at the level of Latin letters and hard alignment at the level of Tibetan characters, thereby improving alignment accuracy between text and speech and enhancing the naturalness and intelligibility of the synthesized speech. Additionally, we integrate pitch and energy information into the model, further enhancing overall synthesis quality. Furthermore, Tibetan has relatively smaller text-to-audio datasets compared to widely studied languages. To address these limited resources, we employ a transfer learning approach to pre-train the model with data from resource-rich languages. Subsequently, this pre-trained mixture alignment FastSpeech2 model is fine-tuned for Tibetan speech synthesis. Experimental results demonstrate that the mixture alignment FastSpeech2 model produces higher-quality speech compared to the original FastSpeech2 model, particularly when pre-trained on an English dataset, resulting in further improvements in clarity and naturalness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tibetan Speech Synthesis Based on Pre-Traind Mixture Alignment FastSpeech2

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Aug 5, 2024
License type: CC BY 4.0

Similar Papers

Abstract 184: The utility of deep metric learning for breast cancer identification on mammographic images
Justin Du ... Sanjay Aneja
Cancer Research | VOL. 81
Justin Du, et. al.Justin Du ... Sanjay Aneja
01 Jul 2021
Cancer Research | VOL. 81

Pretrained domain-specific language model for natural language processing tasks in the AEC domain
Zhe Zheng ... Jia-Rui Lin
Computers in Industry | VOL. 142
Zhe Zheng, et. al.Zhe Zheng ... Jia-Rui Lin
21 Jun 2022
Computers in Industry | VOL. 142

Explainable artificial intelligence (XAI) for predicting the need for intubation in methanol-poisoned patients: a study comparing deep and machine learning models
Khadijeh Moulaei ... Mitra Rahimi
Scientific Reports | VOL. 14
Khadijeh Moulaei, et. al.Khadijeh Moulaei ... Mitra Rahimi
08 Jul 2024
Scientific Reports | VOL. 14

Diversified Curriculum Innovation in College Vocal Music Education under Deep Learning Modeling
Wei Hou
Applied Mathematics and Nonlinear Sciences | VOL. 9
Wei HouWei Hou
11 Nov 2023
Applied Mathematics and Nonlinear Sciences | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tibetan Speech Synthesis Based on Pre-Traind Mixture Alignment FastSpeech2

Abstract

Talk to us

Similar Papers

More From: Applied Sciences