Revisiting Dynamic Adjustment of Language Model Scaling Factor for Automatic Speech Recognition

Hiroshi Sato,Ryo Masumura,Takaaki Fukutomi,Takafumi Moriya,Takanori Ashihara,Kiyoaki Matsui,Yoshikazu Yamaguchi,Yusuke Shinohara,Yushi Aono

doi:10.1109/apsipaasc47483.2019.9023080

Abstract

Automatic speech recognition (ASR) systems use the language model scaling factor to weight the probability output by the language model and balance it against those from other models including acoustic models. Although the conventional approach is to set the language model scaling factor to a constant value to suit a given training dataset to maximize overall performance, it is known that the optimal scaling factors varies depending on individual utterances. In this work, we propose a way to dynamically adjust the language model scaling factor to a single utterance. The proposed method utilized a recurrent neural network (RNN) based model to predict optimum scaling factors given ASR results from a training dataset. Some studies have already tackled this utterance dependency in the 2000s, yet few have improved the quality of ASR due to the difficulty in directly modeling the relationship between a series of acoustic features and the optimal scaling factor; a recent breakthrough in RNN technology has now made this feasible. Experiments on a real-world dataset show that the dynamic optimization of the language model scaling factor can improve ASR quality and that the proposed method is effective.

Full Text