Abstract

As one of Chinese minority languages, Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently. This, along with the relatively small Tibetan corpus, has resulted in an unsatisfying performance of Tibetan speech recognition based on an end-to-end model. This paper aims to achieve an accurate Tibetan speech recognition using a small amount of Tibetan training data. We demonstrate effective methods of Tibetan end-to-end speech recognition via cross-language transfer learning from three aspects: modeling unit selection, transfer learning method, and source language selection. Experimental results show that the Chinese-Tibetan multi-language learning method using multi-language character set as the modeling unit yields the best performance on Tibetan Character Error Rate (CER) at 27.3%, which is reduced by 26.1% compared to the language-specific model. And our method also achieves the 2.2% higher accuracy using less amount of data compared with the method using Tibetan multi-dialect transfer learning under the same model structure and data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call