Abstract
In this paper, a multilingual automatic speech recognition (ASR) and language identification (LID) system is designed. In contrast to conventional multilingual ASR systems, this paper takes advantage of the complementarity of the ASR and LID modules. First, the LID module contributes to the language adaptive training of the multilingual acoustic model. Then, the ASR decoding information acts as the confidence metric to balance the LID results. To simulate complex multilingual speech recognition situations, two types of LID strategies are conducted. For a multilingual speech recognition task in which only one language is contained in the speech stream, the language information can be directly determined based on utterance-level judgment. Under this condition, a segment-level statistical component and a two-stage update strategy are designed to assist in the utterance-level language classification. In another multilingual speech recognition task, where the speech stream contains multiple languages simultaneously, the Viterbi language state retrieval method based on neural network (NN) classification is used to perform dynamic detection of the language state. In both cases, the ASR decoding information is used to adjust the language classification results. Without prior knowledge of language identity information, the enhanced LID module achieves an accuracy of 99.3% for utterance-level language judgment and 92.4% for dynamic language detection, and the multilingual ASR system also provides performance comparable to that of monolingual ASR systems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have