Building HMM-TTS Voices on Diverse Data

Vincent Wan,Kayoko Yanagisawa,Masami Akamine,Mark J F Gales,Langzhou Chen,Javier Latorre,Norbert Braunschweiler

doi:10.1109/jstsp.2013.2295058

Abstract

The statistical models of hidden Markov model based text-to-speech (HMM-TTS) systems are typically built using homogeneous data. It is possible to acquire data from many different sources but combining them leads to a non-homogeneous or diverse dataset. This paper describes the application of average voice models (AVMs) and a novel application of cluster adaptive training (CAT) with multiple context dependent decision trees to create HMM-TTS voices using diverse data: speech data recorded in studios mixed with speech data obtained from the internet. Training AVM and CAT models on diverse data yields better quality speech than training on high quality studio data alone. Tests show that CAT is able to create a voice for a target speaker with as little as 7 seconds; an AVM would need more data to reach the same level of similarity to target speaker. Tests also show that CAT produces higher quality voices than AVMs irrespective of the amount of adaptation data. Lastly, it is shown that it is beneficial to model the data using multiple context clustering decision trees.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building HMM-TTS Voices on Diverse Data

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Apr 1, 2014
Citations: 6

Similar Papers

Cluster adaptive training of average voice models
Vincent Wan ... Javier Latorre
-
Vincent Wan, et. al.Vincent Wan ... Javier Latorre
01 May 2014
01 May 2014

Combining multiple high quality corpora for improving HMM-TTS
Vincent Wan ... Heiga Zen
-
Vincent Wan, et. al.Vincent Wan ... Heiga Zen
09 Sep 2012
09 Sep 2012

Cluster adaptive training with factorized decision trees for speech recognition
Kai Yu ... Hainan Xu
-
Kai Yu, et. al.Kai Yu ... Hainan Xu
25 Aug 2013
25 Aug 2013

Multiple-average-voice-based speech synthesis
Pierre Lanchantin ... Mark J.F Gales
-
Pierre Lanchantin, et. al.Pierre Lanchantin ... Mark J.F Gales
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building HMM-TTS Voices on Diverse Data

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing