Abstract

In terms of intelligent voice customer service of Inner Mongolia Electric Power, there are a large number of Mongolian speakers. The Mongolian speech recognition in it mainly applies Q&A mode which uses sentences for realizing human-machine dialogue. However, in the process of training the Mongolian acoustic model based on deep neural network-hidden markov model (DNN-HMM), the fragment information of Mongolian speech is mainly applied because of different lengths of speech sentences, it ignores integrity of speech sentences. In this regard, this paper proposes a Mongolian acoustic model based on Bi-directional Long Short-Term Memory-Connectionist Temporal Classification (BLSTM-CTC), which unifies length of input sentences and models complete sentences by inserting BLANK features and labels. The results of comparison experiment of speech recognition between BLSTM-CTC and DNN-HMM shows lower word error rate and sentence error rate of speech recognition based on BLSTM-CTC, especially in later, with reduces by 3.57% and 4.09% respectively. That indicates modeling ability of BLSTM-CTC, especially the modeling ability for sentences, is obviously higher than the DNN-HMM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.