Clustering of duration patterns in speech for Text-to-Speech Synthesis

K.S Sreelekshmi,Deepa P Gopinath

doi:10.1109/indcon.2012.6420785

Abstract

Synthesis of natural sounding speech is the greatest challenge in a Text-to-Speech Synthesis (TTS) system. In natural speech, duration, intensity and pitch are dynamically varied which is manifested as rhythm or prosody of speech. If these variations are not recreated, the synthesized speech will sound robotic. Synthesis of good quality speech depends on how well the duration and intonation patterns are imposed on speech segments. The best way to improve naturalness in speech is to mimic the way human brain imposes rhythm. We speak in a particular style by varying the duration of the speech segments in words and phrases as per certain specific duration patterns. Brain might be retrieving the corresponding patterns at the time of speaking for generating a discourse in a particular style (news reading, bible reading, story telling etc.). The main objective of this work is to investigate the existence of duration patterns in natural speech using cluster analysis. Speech uttered in Malayalam, an Indian language was taken for analysis. Cluster analysis was done on isolated words, as well as on words and phrases in continuous speech. Results of cluster analysis when observed using silhouette plot showed the existence of duration patterns in speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clustering of duration patterns in speech for Text-to-Speech Synthesis

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

HMM-based Finnish text-to-speech system utilizing glottal inverse filtering
Tuomo Raitio ... Martti Vainio
-
Tuomo Raitio, et. al.Tuomo Raitio ... Martti Vainio
22 Sep 2008
22 Sep 2008

Effect of Speaking Rate on Recognition of Synthetic and Natural Speech by Normal-Hearing and Cochlear Implant Listeners
Caili Ji ... John J Galvin
Ear & Hearing | VOL. 34
Caili Ji, et. al.Caili Ji ... John J Galvin
01 May 2013
Ear & Hearing | VOL. 34

A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events
Okko Räsänen
Cognition | VOL. 120
Okko RäsänenOkko Räsänen
27 Apr 2011
Cognition | VOL. 120

Automatic Speech Segmentation with the Application of the Czech TTS System
Petr Horák ... Betty Hesounová
-
Petr Horák, et. al.Petr Horák ... Betty Hesounová
01 Jan 1999
01 Jan 1999

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering of duration patterns in speech for Text-to-Speech Synthesis

Abstract

Talk to us

Similar Papers