Abstract

The aim of this paper is to exhibit a comparative case study of the conventional speech recognition GMM-HMM (Gaussian mixture model — hidden Markov model) architecture and the recent model based on deep neural networks. During years the GMM approach has controlled the speech recognition tasks, however it has been surpassed with the resurgence of artificial neural networks. To exemplify these acoustic modeling frameworks, a case study has been conducted by using the Kaldi toolkit, employing a personalized speaker-independent mid-vocabulary voice corpus for recognition of digit strings and personal name lists in latin spanish on a connected-words phone dialing task. The speech recognition accuracy obtained in the results shows a better word error rate by using the DNN acoustic modeling. A 20.71% relative improvement is obtained with DNN-HMM models (3.33% WER) in respect to the lowest GMM-HMM rate (4.20% WER).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.