Text-to-speech from concatenation of articulatory units derived from natural speech

Daniel J Sinder,M Mohan Sondhi

doi:10.1121/1.4780181

Abstract

It has been conjectured that articulatory synthesis possesses the greatest potential for generating high quality synthetic speech. However, for text-to-speech (TTS), waveform concatenation techniques have proven more practical due in part to the challenge of generating appropriate trajectories of articulatory parameters. A waveform generation method for TTS that combines the practical success of concatenative methods with the quality potential of articulatory synthesis is under development. The system concatenates articulatory units derived from natural speech using an articulatory voice mimic. The mimic estimates articulatory parameters by minimizing a cost function that includes a spectral distance between natural and synthetic speech and a geometric distance that penalizes rapid or discontinuous changes in articulator positions. A database of articulatory trajectories representing phonetic units is constructed from the estimated parameters. For TTS, phonetic units generated by text analysis are used to select the corresponding articulatory units from the database. Duration modification, concatenation, and smoothing across units are performed in the articulatory domain resulting in a single articulatory trajectory for the complete utterance. Speech is synthesized from the trajectory using a two mass model for voicing, achieving a high degree of acoustic continuity across unit boundaries while also allowing for source–tract interaction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Text-to-speech from concatenation of articulatory units derived from natural speech

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Similar Papers

The intelligibility and comprehension of synthetic versus natural speech in dyslexic students
Vicky Giannouli ... Marianna Banou
Disability and Rehabilitation: Assistive Technology | VOL. 15
Vicky Giannouli, et. al.Vicky Giannouli ... Marianna Banou
24 Jul 2019
Disability and Rehabilitation: Assistive Technology | VOL. 15

Phoneme detection as a tool for comparing perception of natural and synthetic speech
Andrew J Nix ... Anne Cutler
Computer Speech & Language | VOL. 7
Andrew J Nix, et. al.Andrew J Nix ... Anne Cutler
01 Jul 1993
Computer Speech & Language | VOL. 7

Nonlinear analysis of natural vs. HTS-based synthetic speech
Hemant A Patil ... S Adarsa
-
Hemant A Patil, et. al.Hemant A Patil ... S Adarsa
01 Oct 2014
01 Oct 2014

Quality assessment of synthetic speech using word intelligibility scores
Toshiro Watanabe
The Journal of the Acoustical Society of America | VOL. 84
Toshiro WatanabeToshiro Watanabe
01 Nov 1988
The Journal of the Acoustical Society of America | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text-to-speech from concatenation of articulatory units derived from natural speech

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America