Abstract

In speech recognition research, because of the variety of languages, corresponding speech recognition systems need to be constructed for different languages. Especially in a dialect speech recognition system, there are many special words and oral language features. In addition, dialect speech data is very scarce. Therefore, constructing a dialect speech recognition system is difficult. This paper constructs a speech recognition system for Sichuan dialect by combining a hidden Markov model (HMM) and a deep long short-term memory (LSTM) network. Using the HMM-LSTM architecture, we created a Sichuan dialect dataset and implemented a speech recognition system for this dataset. Compared with the deep neural network (DNN), the LSTM network can overcome the problem that the DNN only captures the context of a fixed number of information items. Moreover, to identify polyphone and special pronunciation vocabularies in Sichuan dialect accurately, we collect all the characters in the dataset and their common phoneme sequences to form a lexicon. Finally, this system yields a 11.34% character error rate on the Sichuan dialect evaluation dataset. As far as we know, it is the best performance for this corpus at present.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.