The performance of isolated word speech recognition system has steadily improved over time as we learn more about how to represent the significant events in speech, and how to capture these events via appropriate analysis procedures and training algorithms. In particular, algorithms based on both template matching (via dynamic time warping (DTW) procedures) and hidden Markov models (HMMs) have been developed which yield high accuracy on several standard vocabularies, including the 10 digits (zero to nine) and the set of 26 letters of the English alphabet (A-Z). Results are given showing currently attainable performance of a laboratory system for both template-based (DTW) and HMM-based recognizers, operating in both speaker trained and speaker independent modes, on the digits and the alphabet vocabularies using telephone recordings. We show that the average error rates of these systems, on standard vocabularies, are significantly lower than those reported several years back on the exact same databases, thereby reflecting the progress which has been made in all aspects of the speech recognition process.
Read full abstract