Toward a multi-speaker visual articulatory feedback system

Atef Ben Youssef,Pierre Badin,Thomas Hueber,Gérard Bailly

doi:10.21437/interspeech.2011-238

Abstract

In this paper, we present recent developments on the HMMbased acoustic-to-articulatory inversion approach that we develop for a “visual articulatory feedback” system. In this approach, multi-stream phoneme HMMs are trained jointly on synchronous streams of acoustic and articulatory data, acquired by electromagnetic articulography (EMA). Acousticto-articulatory inversion is achieved in two steps. Phonetic and state decoding is first performed. Then articulatory trajectories are inferred from the decoded phone and state sequence using the maximum-likelihood parameter generation algorithm (MLPG). We introduce here a new procedure for the reestimation of the HMM parameters, based on the Minimum Generation Error criterion (MGE). We also investigate the use of model adaptation techniques based on maximum likelihood linear regression (MLLR), as a first step toward a multispeaker visual articulatory feedback system. Index Terms: Acoustic-articulatory inversion, ElectroMagnetic Articulography (EMA), Hidden Markov Model (HMM), Minimum Generation Error (MGE), Speaker adaptation, Maximum Likelihood Linear Regression (MLLR).

Full Text