A global, boundary-centric framework for unit selection text-to-speech synthesis

J.R Bellegarda

doi:10.1109/tsa.2005.858048

Abstract

The level of quality that can be achieved by modern concatenative text-to-speech synthesis heavily depends on the optimization criteria used in the unit selection process. While effective cost functions arise naturally for prosody assessment, the criteria typically selected to quantify discontinuities in the speech signal do not closely reflect users' perception of the resulting acoustic waveform. This paper introduces an alternative feature extraction paradigm, which eschews general purpose Fourier analysis in favor of a modal decomposition separately optimized for each boundary region. The ensuing transform framework preserves, by construction, those properties of the waveform which are globally relevant to each concatenation considered. In addition, it leads to a novel discontinuity measure which jointly, albeit implicitly, accounts for both interframe incoherence and discrepancies in formant frequencies/bandwidths. Experimental evaluations are conducted to characterize the behavior of this new metric, first on a contiguity prediction task, and then via a systematic listening comparison using a conventional metric as baseline. The results underscore the viability of the proposed framework in quantifying the perception of discontinuity between acoustic units.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A global, boundary-centric framework for unit selection text-to-speech synthesis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech and Language Processing	Publication Date: May 1, 2006
Citations: 44

Similar Papers

GRADIENT-DESCENT BASED UNIT-SELECTION OPTIMIZATION ALGORITHM USED FOR CORPUS-BASED TEXT-TO-SPEECH SYNTHESIS
Matej Rojc ... Zdravko Kačič
Applied Artificial Intelligence | VOL. 25
Matej Rojc, et. al.Matej Rojc ... Zdravko Kačič
01 Aug 2011
Applied Artificial Intelligence | VOL. 25

Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to-Speech System in Hindi
K Sreenivasa Rao ... Sudhamay Maity
-
K Sreenivasa Rao, et. al.K Sreenivasa Rao ... Sudhamay Maity
01 Jan 2009
01 Jan 2009

Globally Optimal Training of Unit Boundaries in Unit Selection Text-to-Speech Synthesis
Jerome R Bellegarda
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Jerome R BellegardaJerome R Bellegarda
01 Mar 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Joint prosody prediction and unit selection for concatenative speech synthesis
I Bulyko ... M Ostendorf
-
I Bulyko, et. al.I Bulyko ... M Ostendorf
07 May 2001
07 May 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A global, boundary-centric framework for unit selection text-to-speech synthesis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech and Language Processing