Amharic is the second most spoken Semitic language after Arabic. It has its own syllabary writing system, each character representing a consonant and a vowel. Automatic Speech Recognition (ASR) researches for Amharic have been conducted on the basis of grapheme-based pronunciation lexicon, taking advantage of the nature of its writing system. However, the epenthetic vowel and the glottal stop consonant represented in the writing system may not be pronounced in all of their occurrences. Moreover, the writing system does not differentiate geminated and non-geminated forms of consonants. Therefore, the grapheme-based pronunciation lexicon used so far has limitations with regard to these language features. To handle these limitations, we have prepared word- and morpheme-based pronunciation lexicons using data-driven and knowledge-driven experts’ transcription. The data-driven transcription has been used for the preparation of training pronunciation lexicon while the knowledge-driven has been used to prepare morpheme- and word-based pronunciation lexicons for decoding. When morpheme-based knowledge-driven lexicons are used, better ASR performance (compared with the baseline ASR system that used grapheme-based lexicon) has been achieved although the number of phones is much more (60) than the number of phones used in the grapheme-based lexicon (37).
Read full abstract