Abstract
Abstract An articulatory model of speech production is created for the purpose of studying the links between speech production and perception. A computationally effective method for speech inversion in proposed, using a two-pole predictor structure in order to maintain better articulatory dynamics when compared to conventional dynamic programming methods. Preliminary tests for the effect of inversion are performed for 2500 Finnish syllables extracted from continuous speech, consisting of 125 different syllable classes. A cluster selectivity test shows that the syllables are more reliably clustered using the automatically obtained parametric representation of articulatory gestures rather than the original formant representation that is used as a starting point for the inversion. Index Terms: Articulatory model, speech inversion, motor theory, vocal tract 1. Introduction to articulatory modeling Speech events are more conveniently described in articulatory than acoustic sense. Individual articulators move rather slowly and smoothly when compared to spectral characteristics of speech signals. Since the relative trajectories of different articulators remain rather similar in the production of speech sounds regardless of the speaker, the modeling of speech perception with articulatory modeling may help to overcome many of the problems that arise from the ambiguity in the purely acoustic domain. In the 19th century research on the area of articulatory modeling boomed when the first electrical and then digital models for speech production could be implemented. Researchers have often referred to articulatory models developed by Coker [1], Mermelstein [2] or Maeda [3], for example when studying the speech inverse problem. Maeda’s model’s seven articulatory parameters were estimated from x-ray tracings using so-called arbitrary factor analysis in order to have the parameters maximally uncorrelated to each other. Mermelstein’s geometrical articulatory model depicts the positions of articulators in the midsagittal plane. Lips, jaw, tongue, velum and hyoid are considered as movable structures. In 1990’s and 2000’s more complex vocal tract and tongue models were developed. E.g. Dang and Honda have created a 3D articulatory model which used physiological constraints typical to human articulation in inverting vowel-to-vowel sequences [4].
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have