Abstract
In recent speech recognition technology, the matching measure between a hypothesis and the corresponding speech segment is usually defined on the basis of HMM likelihood. As is well known, however, the likelihood is a relative measure, and some kind of normalization is necessary when hypotheses corresponding to different speech segments are to be compared. The aim of this paper is to show that the mutual information, or, equivalently, the likelihood normalized by the probability of the speech segment, is a better acoustic matching measure than the likelihood. An ergodic HMM was exploited to estimate the speech probability. An all-phone model was also tried as a speech probability estimator for comparison. An HMM-based connected word recognition algorithm was employed to generate recognition hypotheses. Those hypotheses were scored according to the above matching measures. An ART 200-sentence English speech database was used for the experiment. Evaluation was conducted from various points of view: recognition rate, word-end detection power, etc. The results show that the mutual information calculated with an ergodic HMM significantly outperforms the likelihood as an acoustic matching measure.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.