Abstract
We present a framework for speech recognition that accounts for hidden articulatory information. We model the articulatory space using a codebook of articulatory configurations geometrically derived from EMA measurements available in the MOCHA database. The articulatory parameter set we derive is in the form of Maeda parameters. In turn, these parameters are used in a physiologically- motivated articulatory speech synthesizer based on the model by Sondhi and Schroeter. We use the distortion between the speech synthesized from each of the articulatory configurations and the original speech as features for recognition. We setup a segmented phoneme recognition task on the MOCHA database using Gaussian mixture models (GMMs). Improvements are achieved when combining the probability scores generated using the distortion features with the scores using acoustic features.
Submitted Version (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have