An Analysis of HMM-based prediction of articulatory movements

Zhen-Hua Ling,Korin Richmond,Junichi Yamagishi

doi:10.1016/j.specom.2010.06.006

Zhen-Hua Ling, Korin Richmond + Show 1 more

Open Access

https://doi.org/10.1016/j.specom.2010.06.006

Copy DOI

Abstract

This paper presents an investigation into predicting the movement of a speaker’s mouth from text input using hidden Markov models (HMM). A corpus of human articulatory movements, recorded by electromagnetic articulography (EMA), is used to train HMMs. To predict articulatory movements for input text, a suitable model sequence is selected and a maximum-likelihood parameter generation (MLPG) algorithm is used to generate output articulatory trajectories. Unified acoustic-articulatory HMMs are introduced to integrate acoustic features when an acoustic signal is also provided with the input text. Several aspects of this method are analyzed in this paper, including the effectiveness of context-dependent modeling, the role of supplementary acoustic input, and the appropriateness of certain model structures for the unified acoustic-articulatory models. When text is the sole input, we find that fully context-dependent models significantly outperform monophone and quinphone models, achieving an average root mean square (RMS) error of 1.945 mm and an average correlation coefficient of 0.600. When both text and acoustic features are given as input to the system, the difference between the performance of quinphone models and fully context-dependent models is no longer significant. The best performance overall is achieved using unified acoustic-articulatory quinphone HMMs with separate clustering of acoustic and articulatory model parameters, a synchronous-state sequence, and a dependent-feature model structure, with an RMS error of 0.900 mm and a correlation coefficient of 0.855 on average. Finally, we also apply the same quinphone HMMs to the acoustic-articulatory, or inversion, mapping problem, where only acoustic input is available. An average root mean square (RMS) error of 1.076 mm and an average correlation coefficient of 0.812 are achieved. Taken together, our results demonstrate how text and acoustic inputs both contribute to the prediction of articulatory movements in the method used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Speech Communication	Publication Date: Jun 30, 2010
Citations: 64	License type: other-oa

R Discovery Prime

R Discovery Prime

An Analysis of HMM-based prediction of articulatory movements

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Similar Papers

HMM-based text-to-articulatory-movement prediction and analysis of critical articulators
Zhen-Hua Ling ... Korin Richmond
-
Zhen-Hua Ling, et. al.Zhen-Hua Ling ... Korin Richmond
26 Sep 2010
26 Sep 2010

Is average RMSE appropriate for evaluating acoustic-to-articulatory inversion?
Qiang Fang
-
Qiang FangQiang Fang
01 Nov 2019
01 Nov 2019

Minimum generation error training for HMM-based prediction of articulatory movements
Tian-Yi Zhao ... Li-Rong Dai
-
Tian-Yi Zhao, et. al.Tian-Yi Zhao ... Li-Rong Dai
01 Nov 2010
01 Nov 2010

Estimation of ground reaction forces and ankle moment with multiple, low-cost sensors.
Daniel A Jacobs ... Daniel P Ferris
Journal of NeuroEngineering and Rehabilitation | VOL. 12
Daniel A Jacobs, et. al.Daniel A Jacobs ... Daniel P Ferris
14 Oct 2015
Journal of NeuroEngineering and Rehabilitation | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Analysis of HMM-based prediction of articulatory movements

Abstract

Talk to us

Similar Papers

More From: Speech Communication