A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

Huy Kinh Phan,Quoc Bao Nguyen,Viet Lam Phung,Anh Tuan Dinh

doi:10.1109/kse50997.2020.9287553

Abstract

In recent years, statistical parametric speech synthesis (SPSS) systems have been widely utilized in many interactive speech-based systems (e.g. Amazon’s Alexa, Bose’s headphones). To select a suitable SPSS system, both speech quality and performance efficiency (e.g. decoding time) must be taken into account. In the paper, we compared four popular Vietnamese SPSS techniques using: 1) hidden Markov models (HMM), 2) deep neural networks (DNN), 3) generative adversarial networks (GAN), and 4) end-to-end (E2E) architectures, which consists of Tacontron 2 and WaveGlow vocoder in terms of speech quality and performance efficiency. We showed that the E2E systems accomplished the best quality, but required the power of GPU to achieve real-time performance. We also showed that the HMM- based system had inferior speech quality, but it was the most efficient system. Surprisingly, the E2E systems were more efficient than the DNN and GAN in inference on GPU. Surprisingly, the GAN-based system did not outperform the DNN in term of quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Constructing a Deep Neural Network Based Spectral Model for Statistical Speech Synthesis
Shinji Takaki ... Junichi Yamagishi
-
Shinji Takaki, et. al.Shinji Takaki ... Junichi Yamagishi
01 Jan 2015
01 Jan 2015

DNN-based Speaker-adaptive Postfiltering with Limited Adaptation Data for Statistical Speech Synthesis Systems
Mirac Goksu Ozturk ... Okan Ulusoy
-
Mirac Goksu Ozturk, et. al.Mirac Goksu Ozturk ... Okan Ulusoy
01 May 2019
01 May 2019

Deep neural network-guided unit selection synthesis
Thomas Merritt ... Simon King
-
Thomas Merritt, et. al.Thomas Merritt ... Simon King
01 Mar 2016
01 Mar 2016

Time-domain deterministic plus noise model based hybrid source modeling for statistical parametric speech synthesis
N.P Narendra ... K Sreenivasa Rao
Speech Communication | VOL. 77
N.P Narendra, et. al.N.P Narendra ... K Sreenivasa Rao
23 Dec 2015
Speech Communication | VOL. 77

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

Abstract

Talk to us

Similar Papers