Statistical Models of Text in Continuous Speech Recognition

Ye‐Sho Chen

doi:10.1108/eb005896

Abstract

A major difficulty in continuous speech recognition research is the lack of effective and objective evaluation of the statistical models of text. Herbert Simon's view for evaluating theories is here applied to the statistical modelling of text. Three significant contributions can be identified. First, a time‐series representation of text is used to identify three well‐known empirical laws of text generation. These laws provide an effective and objective approach for evaluating four leading statistical models of text. Second, it is shown that the Simon‐Yule model of text provides a constructive mechanism for those laws. Third, based on Simon's explanatory processes of imitation and association, an adaptive framework for continuous speech recognition is suggested.

Full Text