Language model adaptation via minimum discrimination information

P.S Rao,S Roukos,M.D Monkowski

doi:10.1109/icassp.1995.479389

Abstract

Statistical language models improve the performance of speech recognition systems by providing estimates of a priori probabilities of word sequences. The commonly used trigram language models obtain the conditional probability estimate of a word given the previous two words, from a large corpus of text. The text corpus is often a collection of several small diverse segments such as newspaper articles, or conversations on different topics. Knowledge of the current topic could be utilized to adapt the general trigram language models to match that topic closely. For example, an interpolation of the general language model with one built on the topic data could be used. The authors first discuss the adaptation of general trigram language models to a known topic using the minimum discrimination information (MDI) method. They then present results on the switchboard corpus which consists of telephone conversations on several topics.

Full Text