Abstract
This paper presents the results of a speaker-independent, isolated word speech recognition system developed for information access over Australian public switched telephone network (PSTN). The recognition system is based on Continuous Density Hidden Markov Modelling (CDHMM). The speech database was collected over the PSTN from a large variety of speakers and different geographical locations. The database contained a vocabulary of 55 words consisting of 41 country names and their variations plus a few control words. The recognition performance, tested on 100 other speakers (50 males and 50 females) with no grammar constraint, resulted in an overall recognition rate of 97.3%. This paper describes the HMM training methodology, which consisted of three stages: hand segmented seed model training, automatic word segmentation and reestimation. To facilitate the future implementation of the recognition system in a DSP environment, a fast frame synchronous Viterbi algorithm was implemented with no degradation in recognition performance. The end-point detection is performed by the combination of the silence/noise model with the word models. For confusable word pairs, sub-word models are used to improve the recognition rate. A post-processing approach is used to enhance the performance of the recognition system, in which all ranked candidates from the Viterbi decoding are subject to the tests of the minimum word duration and the likelihood difference between the first candidate and the second candidate.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have