An HMM-based method for Thai spelling speech recognition

C Pisarn,T Theeramunkong

doi:10.1016/j.camwa.2006.10.030

Abstract

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents an approach to construct three recognizers for the three commonly-used Thai spelling methods based on hidden Markov models (HMMs). The Thai phonetic characteristics, alphabet system and spelling methods are analyzed. For the first spelling method, two recognizers, each trained from a small spelling corpus and an existing large continuous speech corpus, are explored. To solve utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed is taken into account. Two alternative language models, bigram and trigram, are investigated to evaluate the performance of spelling speech recognition under three different environments: close-type, open-type and mix-type language models. For the first spelling method, our approach achieves up to 93.09% letter correct rate (LCR) and 92.45% letter accuracy (LA) when the language model is trigram under the mix-type environment and the acoustic model is trained from the small spelling corpus. Under the same conditions, we obtained 81.12% LCR and 76.32% LA for the second spelling method and 78.47% LCR and 71.75% LA for the third spelling method. By analyzing the results, it was found that the main source of the errors was letter substitution, which is mostly triggered by the confusion of similar consonant phones and the confusion of short/long vowel pairs.

Full Text