Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

H Zen,J Latorre,N Braunschweiler,M J F Gales,K Knill,S Krstulovic,S Buchholz

doi:10.1109/tasl.2012.2187195

Abstract

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Aug 1, 2012
Citations: 125

Similar Papers

A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis
Olga Khomitsevich ... Valentin Mendelev
-
Olga Khomitsevich, et. al.Olga Khomitsevich ... Valentin Mendelev
01 Jan 2015
01 Jan 2015

An Exchange Protocol for Continuous Speech Recognition and Synthesis System
G Osman
IFAC Proceedings Volumes | VOL. 16
G OsmanG Osman
01 Sep 1983
IFAC Proceedings Volumes | VOL. 16

Phonetic alignment: speech synthesis-based vs. Viterbi-based
F Malfrère ... C Ris
Speech Communication | VOL. 40
F Malfrère, et. al.F Malfrère ... C Ris
13 Sep 2002
Speech Communication | VOL. 40

An automatic speech recognition system on DSP board
Guo-Shing Huang ... Zhi-Hao Tian
-
Guo-Shing Huang, et. al.Guo-Shing Huang ... Zhi-Hao Tian
01 Nov 2016
01 Nov 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing