Phone duration modeling: overview of techniques and performance optimization via feature selection in the context of emotional speech

Alexandros Lazaridis,Theodoros Kostoulas,Iosif Mporas,Todor Ganchev,Nikos Fakotakis

doi:10.1007/s10772-010-9077-x

Abstract

Accurate modeling of prosody is prerequisite for the production of synthetic speech of high quality. Phone duration, as one of the key prosodic parameters, plays an important role for the generation of emotional synthetic speech with natural sounding. In the present work we offer an overview of various phone duration modeling techniques, and consequently evaluate ten models, based on decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms, which over the past decades have been successfully used in various modeling tasks. Furthermore, we study the opportunity for performance optimization by applying two feature selection techniques, the RReliefF and the Correlation-based Feature Selection, on a large set of numerical and nominal linguistic features extracted from text, such as: phonetic, phonologic and morphosyntactic ones, which have been reported successful on the phone and syllable duration modeling task. We investigate the practical usefulness of these phone duration modeling techniques on a Modern Greek emotional speech database, which consists of five categories of emotional speech: anger, fear, joy, neutral, sadness. The experimental results demonstrated that feature selection significantly improves the accuracy of phone duration prediction regardless of the type of machine learning algorithm used for phone duration modeling. Specifically, in four out of the five categories of emotional speech, feature selection contributed to the improvement of the phone duration modeling, when compared to the case without feature selection. The M5p trees based phone duration model was observed to achieve the best phone duration prediction accuracy in terms of RMSE and MAE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phone duration modeling: overview of techniques and performance optimization via feature selection in the context of emotional speech

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology

Lead the way for us

Journal: International Journal of Speech Technology	Publication Date: Jul 30, 2010
Citations: 38

Similar Papers

Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech
Alexandros Lazaridis ... Todor Ganchev
-
Alexandros Lazaridis, et. al.Alexandros Lazaridis ... Todor Ganchev
01 Jan 2009
01 Jan 2009

Phone Duration Modeling of Affective Speech Using Support Vector Regression
Alexandros Lazaridis ... Todor Ganchev
International Journal of Intelligent Systems and Applications | VOL. 4
Alexandros Lazaridis, et. al.Alexandros Lazaridis ... Todor Ganchev
02 Jul 2012
International Journal of Intelligent Systems and Applications | VOL. 4

Comparative Evaluation of Phone Duration Models for Greek Emotional Speech
Lazaridis
Journal of Computer Science | VOL. 6
Lazaridis Lazaridis
01 Mar 2010
Journal of Computer Science | VOL. 6

The development of phone duration model in speech synthesis in the Serbian language
Sandra Sovilj-Nikic ... Ivan Sovilj-Nikic
-
Sandra Sovilj-Nikic, et. al.Sandra Sovilj-Nikic ... Ivan Sovilj-Nikic
01 Nov 2015
01 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phone duration modeling: overview of techniques and performance optimization via feature selection in the context of emotional speech

Abstract

Talk to us

Similar Papers

More From: International Journal of Speech Technology