Abstract

This paper describes the acoustic-to-articulatory inversion mapping using a Gaussian Mixture Model (GMM). Correspondence of an acoustic parameter and an articulatory parameter is modeled by the GMM trained using the parallel acousticarticulatory data. We measure the performance of the GMMbased mapping and investigate the effectiveness of using multiple acoustic frames as an input feature and using multiple mixtures. As a result, it is shown that although increasing the number of mixtures is useful for reducing the estimation error, it causes many discontinuities in the estimated articulatory trajectories. In order to address this problem, we apply maximum likelihood estimation (MLE) considering articulatory dynamic features to the GMM-based mapping. Experimental results demonstrate that the MLE using dynamic features can estimate more appropriate articulatory movements compared with the GMM-based mapping applied smoothing by lowpass filter.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call