Abstract

This paper describes an articulatory speech production model trained on an X-ray microbeam database, and presents results of using the model within a speech recognition framework. The system uses an explicit statistical model of co-articulation to increase the accuracy of articulator trajectories synthesized from time-aligned phonetic strings, as compared with X-ray traces. From these trajectories, spectral vector probability distributions are generated using a set of artificial neural networks. The production model is then used in combination with a hidden Markov model recognition system to re-scoreN -best utterance transcription lists. Relative reductions in the word error rate of between 11% and 18% are achieved on a small recognition task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call