Abstract

This paper introduces the work on automatic speech recognition (ASR) of Myanmar spontaneous speech. The recognizer is based on the Gaussian Mixture and Hidden Markov Model (GMM-HMM). A baseline ASR is developed with 20.5 h of spontaneous speech corpus and refine it with many speaker adaptation methods. In this paper, five kinds of adapted acoustic models were explored; Maximum A Posteriori (MAP), Maximum Mutual Information (MMI), Minimum Phone Error (MPE), Maximum Mutual Information including feature space and model space (fMMI) and Subspace GMM (SGMM). We evaluate these adapted models using spontaneous evaluation set consists of 100 utterances from 61 speakers totally 23 min and 19 s. Experiments on this speech corpus show significant improvement of speaker adaptative training models and SGMM-based acoustic model performs better than other adaptative models. It can significantly reduce 3.16% WER compared with the baseline GMM model. It is also investigated that the Deep Neural Network (DNN) training on the same corpus and evaluated with same evaluation set. With respect to the DNN training, the result reaches up to 31.5% WER.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call