Abstract

In this paper, we present our investigations towards the development of multilingual automatic speech recognition (ML ASR) systems using the GlobalPhone database. In addition to GlobalPhone, we have included 4 Ethiopian languages (Amharic, Oromo, Tigrigna and Wolaytta), as well as Uyghur and English in our investigation. In order to see the impact of language relatedness in ML ASR training, we have analyzed both phonetic overlap and morphological complexity of the languages. Deep Neural Network based ML ASR systems have been developed using ML mix, transfer and multitask learning approaches. Relative word error rate (WER) reductions up to 33.21% have been achieved as a result of using resources of other languages in ML acoustic model training. Our experimental results show that languages with small amounts of monolingual training data benefit a lot from ML training. Moreover, using phonetically related languages in ML training is more beneficiary than using phonetically less related languages. It seems that the nature of the corpus (single or mixed domain, noisy or noise free, etc.) has also an impact in ML training although it is not as important as the phonetic relatedness of the languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call