Abstract

This paper presents a large vocabulary automatic speech recognition (ASR) system for Myanmar. To the best of our knowledge, this is the first such system for the Myanmar language. We will report main processes of developing the system, including data collection, pronunciation lexicon construction, effective acoustic features selection, acoustic and language modelings, and evaluation criteria. Considering the fact that Myanmar being a tonal language, the tonal features were incorporated to acoustic modeling and their effectiveness were verified. Differences between the word-based language model (LM) and syllable-based LM were investigated; the word-based LM was found superior to the syllable-based model. To disambiguate the definitions of Myanmar words and achieve high reliability on the recognition results, we explored the characteristics of the Myanmar language, and proposed the Syllable Error Rate (SER) as a suitable evaluation criterion for Myanmar ASR system. 3 kinds of acoustic models; 1 Gaussian Mixture Model (GMM) and 2 Deep Neural Networks (DNNs) were explored by only utilizing the developed phonemically-balanced corpus consisting of 4K sentences and 40 hours of speech. An open evaluation set containing 100 utterances, spoken by 25 speakers, were experimented. With respect to the sequence discriminative training DNN, the results reached up to 15.63% in word error rate (WER) or 10.87% in SER.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call