A hybrid CTC+Attention model based on end-to-end framework for multilingual speech recognition

Sendong Liang,Wei Qi Yan

doi:10.1007/s11042-022-12136-3

Sendong Liang, Wei Qi Yan

Open Access

https://doi.org/10.1007/s11042-022-12136-3

Copy DOI

Abstract

Speech recognition is an important field in natural language processing. In this paper, the end-to-end framework for speech recognition with multilingual datasets is proposed. The end-to-end methods do not require complicated alignment and construction of the pronunciation dictionary, which show a promising prospect. In this paper, we implement a hybrid model of CTC and attention (CTC+Attention) model based on PyTorch. In order to compare speech recognition methods for multiple languages, we design and create three datasets: Chinese, English, and Code-Switch. We evaluate the proposed hybrid CTC+Attention model in multilingual environment. Throughout our experiments, we find that the proposed hybrid CTC+Attention model based on end-to-end framework achieves better performance compared with the HMM-DNN model in a single language and Code-Switch speaking environment. Moreover, the results of speech recognition with regard to different languages are compared in this paper. The CER(i.e., Character Error Rate) of the proposed hybrid CTC+Attention model based on the Chinese dataset defeated the traditional model and reached 10.22%.

Full Text