Fine Tuning and Comparing Tacotron 2, Deep Voice 3, and FastSpeech 2 TTS Models in a Low Resource Environment

T Gopalakrishnan,Archit Aggarwal,Syed Ayaz Imam

doi:10.1109/icdsis55133.2022.9915932

Abstract

Text-to-speech (TTS) models are used to generate speech from a sequence of characters provided as input. Existing TTS systems require a high-quality large dataset and vast computational resources for training. However, most of the publicly available datasets do not meet such standards, and access to powerful GPUs may not always be possible. Hence, in our work, we have successfully trained and compared TTS models, specifically Tacotron 2, FastSpeech 2, and Deep Voice 3 on a Tesla T4 GPU using a subset of the LJSpeechl.1 dataset. Subsequently, we have surveyed to analyze the performance of the models when trained on small datasets, and we discovered that the Tacotron 2 TTS model synthesized the most realistic sounding speeches. The survey revealed that the Tacotron 2 TTS model achieved a mean opinion score (MOS) at a 95% confidence interval of 4.25± 0.17, and sounded the most natural to our listeners when compared to the ground truth.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fine Tuning and Comparing Tacotron 2, Deep Voice 3, and FastSpeech 2 TTS Models in a Low Resource Environment

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Comparing human-labeled and AI-labeled speech datasets for TTS
Johannes Wirth ... René Peinl
International Conference on AI Research | VOL. 4
Johannes Wirth, et. al.Johannes Wirth ... René Peinl
04 Dec 2024
International Conference on AI Research | VOL. 4

The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset.
Duc Chung Tran
Data in brief | VOL. 31
Duc Chung TranDuc Chung Tran
27 May 2020
Data in brief | VOL. 31

Hierarchical Transfer Learning for Text-to-Speech in Indonesian, Javanese, and Sundanese Languages
Kurniawati Azizah ... Mirna Adriani
-
Kurniawati Azizah, et. al.Kurniawati Azizah ... Mirna Adriani
17 Oct 2020
17 Oct 2020

A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Rishabh Jain ... Horia Cucu
IEEE Access | VOL. 10
Rishabh Jain, et. al.Rishabh Jain ... Horia Cucu
01 Jan 2021
IEEE Access | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fine Tuning and Comparing Tacotron 2, Deep Voice 3, and FastSpeech 2 TTS Models in a Low Resource Environment

Abstract

Talk to us

Similar Papers