Abstract

Automatic speech recognition (ASR) systems use the language model scaling factor to weight the probability output by the language model and balance it against those from other models including acoustic models. Although the conventional approach is to set the language model scaling factor to a constant value to suit a given training dataset to maximize overall performance, it is known that the optimal scaling factors varies depending on individual utterances. In this work, we propose a way to dynamically adjust the language model scaling factor to a single utterance. The proposed method utilized a recurrent neural network (RNN) based model to predict optimum scaling factors given ASR results from a training dataset. Some studies have already tackled this utterance dependency in the 2000s, yet few have improved the quality of ASR due to the difficulty in directly modeling the relationship between a series of acoustic features and the optimal scaling factor; a recent breakthrough in RNN technology has now made this feasible. Experiments on a real-world dataset show that the dynamic optimization of the language model scaling factor can improve ASR quality and that the proposed method is effective.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.