Abstract

In human heads there is a strong structural linkage between vocal tract and facial behavior during speech. For a robotic talking head to have a human-like behavior, this linkage should be emulated. One way to do that is to compute an estimate of the articulatory features which produce a given utterance and then to transform them into facial animation. We present a computational model of human vocalization which is aimed at describing the articulatory mechanisms which produce spoken phonemes. It uses a set of fuzzy rules and genetic optimization. The former represents the relationships between places of articulations and speech acoustic parameters, while the latter estimates the degrees of membership of the places of articulation. That is, the places of articulation are considered as fuzzy sets whose degrees of membership are the articulatory features. The trajectories of articulatory parameters can be used to control a graphical or mechanical talking head. We verify the model presented here by generating and listening to artificial sentences. Subjective listening tests of artificially generated sentences from the articulatory description resulted in an average phonetic accuracy of about 79 %. Through the analysis of a large amount of natural speech, the algorithm can be used to learn the places of articulation of all phonemes of a given speaker

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call