Abstract

Many studies have shown that articulatory features can significantly improve the performance of automatic speech recognition systems. Unfortunately, such features are not available at recognition time. There are two main approaches to solve this problem: a feature-based approach, the most popular example of which is the acoustic-to-articulatory inversion, where the missing articulatory features are generated from the speech signal, and a model-based approach, where articulatory information is embedded in the model structure and parameters in a way that allows recognition using only acoustic features. In this paper, we propose two new methods to integrate articulatory information into a phoneme recognition system. One of them is feature based, and the other is model based. In both cases, the underlying acoustic model (AM) is a deep neural networks-hidden Markov model (DNN-HMM) hybrid. In the feature-based method, the articulatory inversion DNN and the acoustic model DNN are trained jointly using a linear combination of their loss functions. In the model-based method, we utilize the generalized distillation framework to train the AM DNN. In this case, first, a teacher DNN is trained on both the acoustic and articulatory features, and then its outputs are used as additional targets during the AM DNN training with acoustic features only. A 7-fold cross-validation experiments using 42 speakers from the XRMB database showed that both the proposed methods provide about 22% to 25% performance improvement with respect to the DNN acoustic model trained with acoustic features only.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.