A unified system for multilingual speech recognition and language identification

Danyang Liu,Ji Xu,Pengyuan Zhang,Yonghong Yan

doi:10.1016/j.specom.2020.12.008

Danyang Liu, Ji Xu + Show 2 more

https://doi.org/10.1016/j.specom.2020.12.008

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this paper, a multilingual automatic speech recognition (ASR) and language identification (LID) system is designed. In contrast to conventional multilingual ASR systems, this paper takes advantage of the complementarity of the ASR and LID modules. First, the LID module contributes to the language adaptive training of the multilingual acoustic model. Then, the ASR decoding information acts as the confidence metric to balance the LID results. To simulate complex multilingual speech recognition situations, two types of LID strategies are conducted. For a multilingual speech recognition task in which only one language is contained in the speech stream, the language information can be directly determined based on utterance-level judgment. Under this condition, a segment-level statistical component and a two-stage update strategy are designed to assist in the utterance-level language classification. In another multilingual speech recognition task, where the speech stream contains multiple languages simultaneously, the Viterbi language state retrieval method based on neural network (NN) classification is used to perform dynamic detection of the language state. In both cases, the ASR decoding information is used to adjust the language classification results. Without prior knowledge of language identity information, the enhanced LID module achieves an accuracy of 99.3% for utterance-level language judgment and 92.4% for dynamic language detection, and the multilingual ASR system also provides performance comparable to that of monolingual ASR systems.

Full Text