Sigma-Lognormal Modeling of Speech

C Carmona-Duarte,A Gómez-Rodellar,R Plamondon,M A Ferrer,P Gómez-Vilda

doi:10.1007/s12559-020-09803-8

C Carmona-Duarte, A Gómez-Rodellar + Show 3 more

Open Access

PDF Available

https://doi.org/10.1007/s12559-020-09803-8

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject’s age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics-based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma-lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR-TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.

Highlights

Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond
Computational systems that synthesize and assess speech motor control provide answers to some questions regarding the articulator movements used by humans to produce speech sounds, speech rate effects, or for example, how infants acquire the motor skills needed to produce the speech sounds of their native language [1]
In this paper, we propose a novel methodology based on the Sigma-lognormal model to parameterize the speech kinematics and the muscular response produced by the complex set of muscles involved in achieving the target sound, as well as to study aging effects

Summary

Introduction

Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. As the velocity is a function of α (the proportion between formants), the optimum values of this constant are estimated in this experiment to be compared with the theoretical estimation in “Formants to End Effector Kinematics.” For this assessment, we employed the publicly available VTR-TIMIT database of continuous speech, which is labeled by phonemes, providing the number of phonemes (Np) in each sentence. The parameter increases with age, indicating that the impulse response of the system is slower in the case of older speakers This difference is only appreciated with the “from speech to formant (sl)” method (Table 2), and this could be because this formant extraction method always gives the requested number of formants in every frame, allowing the best interpolation of the complete movement in the case of consonants. If we compare the three age groups (Table 5) only with the NbLog parameter, significant differences are found between the three classes (Fig. 9)

Discussion

Conclusions