GPR-based Thai speech synthesis using multi-level duration prediction

Decha Moungsri,Tomoki Koriyama,Takao Kobayashi

doi:10.1016/j.specom.2018.03.005

Abstract

This paper proposes a multi-level Gaussian process regression (GPR)-based method for duration prediction by incorporating phone- and syllable-level duration models. In this method, we first train the syllable model and predict syllable durations for a given input of context labels. Then, we use the predicted syllable duration as an additional context for the phone-level model to predict phone durations. To apply multi-level duration prediction to the GPR-based speech synthesis framework, we designed phone- and syllable- level context sets for Thai that include linguistic information and the relative positions of speech units. We also examined the multi-level deep neural network (DNN)-based duration-prediction method by using the same approach as for the proposed multi-level GPR-based one. We conducted objective and subjective evaluations using two-hour training data to compare the proposed method with single-level ones. The results indicate that the proposed multi-level duration-prediction method outperformed single-level ones in DNN-, and GPR-based frameworks. They also indicate that the proposed multi-level GPR-based method can provide better performance than the multi-level HMM-based duration-prediction method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GPR-based Thai speech synthesis using multi-level duration prediction

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Mar 14, 2018
Citations: 2

Similar Papers

Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder
Mohammed Salah Al-Radhi ... Tamás Gábor Csapó
-
Mohammed Salah Al-Radhi, et. al.Mohammed Salah Al-Radhi ... Tamás Gábor Csapó
01 Jan 2017
01 Jan 2017

A DNN-based Mandarin-Tibetan cross-lingual speech synthesis
Weitong Guo ... Zhenye Gan
-
Weitong Guo, et. al.Weitong Guo ... Zhenye Gan
01 Nov 2018
01 Nov 2018

Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization
Tomoki Koriyama ... Takao Kobayashi
-
Tomoki Koriyama, et. al.Tomoki Koriyama ... Takao Kobayashi
01 May 2014
01 May 2014

Assessing the degree of nativeness and Parkinson's condition using Gaussian processes and deep rectifier neural networks
Tamás Grósz ... László Tóth
-
Tamás Grósz, et. al.Tamás Grósz ... László Tóth
06 Sep 2015
06 Sep 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GPR-based Thai speech synthesis using multi-level duration prediction

Abstract

Talk to us

Similar Papers

More From: Speech Communication