Recently, hidden Markov models (HMM) have been applied successfully to both isolated and connected word recognition. However, when the same formulation is adopted for recognition of more confusable vocabularies, like English alphabets, the recognition performance is often less satisfactory. One main reason is that robustness issues, such as model validity, recognition parameter selection, model parameter initialization, model parameter estimation, training sample size, and durational information incorporation, can no longer be ignored. In this paper, a stochastic segment model (SSM) is proposed, which is a simplified HMM, for speech recognition. Three specific robustness issues are then discussed, namely the choice of observation densities, the initialization of model parameters, and the incorporation of duration information. In a step-by-step attempt to address those issues, it was found that the same SSM formulation can still be adopted if acoustic and phonetic knowledge about the vocabulary is taken into account in the model parameter estimation and recognition phases. Testing on the 39-word English alpha-digit vocabulary indicates that the recognition performance, based on conventional HMM techniques, can be signficantly improved if model parameters are adequately initialized and durational information is properly incorporated.
Read full abstract