A Study on Neural-Network-Based Text-to-Speech Adaptation Techniques for Vietnamese

Pham Ngoc Phuong,Mai Chi Luong,Chung Tran Quang,Quoc Truong Do

doi:10.1109/o-cocosda202152914.2021.9660445

Pham Ngoc Phuong, Mai Chi Luong + Show 2 more

https://doi.org/10.1109/o-cocosda202152914.2021.9660445

Copy DOI

Abstract

One of the main goals of text-to-speech adaptation techniques is to produce a model that can generate good quality audio given a small amount of training data. In fact, TTS systems for rich-resource languages have good quality because of a large amount of data, but training models with small datasets (or low-resources) is not an easy task, which often produces low-quality sounds. One of the approaches to overcome the data limitation is fine-tuning. However, we still need a pre-trained model which learns from large amount of data in advance. The paper presents two contributions: (1) a study on the amounts of data needed for a traditional fine-tuning method for Vietnamese, where we change the data and run the training for a few more iterations; (2) we present a new fine-tuning pipeline which allows us to borrow a pre-trained model from English and adapt it to any Vietnamese voices with a very small amount of data while still maintaining a good speech synthetic sound. Our experiments show that with only 4 minutes of data, we can synthesize a new voice with a good similarity score, and with 16 minutes of data, the model can generate audio with a 3.8 MOS score.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Study on Neural-Network-Based Text-to-Speech Adaptation Techniques for Vietnamese

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Error Reduction Network for DBLSTM-based Voice Conversion
Mingyang Zhang ... Haizhou Li
-
Mingyang Zhang, et. al.Mingyang Zhang ... Haizhou Li
01 Nov 2018
01 Nov 2018

Dynamic Weighted Filter Bank Domain Adaptation for Motor Imagery Brain–Computer Interfaces
Yukun Zhang ... Xuelin Ma
IEEE Transactions on Cognitive and Developmental Systems | VOL. 15
Yukun Zhang, et. al.Yukun Zhang ... Xuelin Ma
01 Sep 2023
IEEE Transactions on Cognitive and Developmental Systems | VOL. 15

Strategies for statistical spoken language understanding with small amount of data - an empirical study
Ye-Yi Wang
-
Ye-Yi WangYe-Yi Wang
26 Sep 2010
26 Sep 2010

Cloze-Style Data Augmentation for Few-Shot Intent Recognition
Xin Zhang ... Chonghao Chen
Mathematics | VOL. 10
Xin Zhang, et. al.Xin Zhang ... Chonghao Chen
16 Sep 2022
Mathematics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Study on Neural-Network-Based Text-to-Speech Adaptation Techniques for Vietnamese

Abstract

Talk to us

Similar Papers