Abstract
Throughout this work, we explore different methods to integrate a complex Language Model (a hierarchical Language Model based on classes of phrases) into an automatic speech recognition (ASR) system. First of all, an integrated architecture is considered, where the integration is carried out via the composition of the different Stochastic Finite-State Automata associated with the specific Language Model (LM). On the other hand, a decoupled architecture with a two-pass decoder is employed, where the complex LM is used to reorder the N-best list. The formal definition of both methods is provided in this work, thus enabling the theoretical comparison between them. Additionally, different experiments were carried out to compare empirically the proposed approaches. The results show that although the hierarchical LMs outperform a baseline word-based LM in both cases, the integrated architecture can provide better ASR system performance. However, the decoupled architecture could be more versatile due to the two-pass strategy, allowing the integration of different models using a standard decoder. Additionally, the use of this kind of complex LMs can also be extended to other NLP applications, such as language understanding, by employing the proposed architectures.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have