Abstract
Levinson et al. have recently described a powerful set of techniques for clustering multiple replications of words spoken by different talkers into a set of composite reference templates. These techniques have been incorporated into a speaker-independent, isolated word recognition system. The vocabulary which we have tested consists of 39 words including the alphabet, the digits 0–9, and the three cueing words STOP, ERROR, and REPEAT. This vocabulary is of great utility for a wide range of applications of automatic word recognition. The features used for recognition are an eight-pole LPC set measured every 15 ms over a standard telephone line. The distance measure is the log likelihood ratio as originally proposed by Itakura. Several variations of a dynamic time warping algorithm have been incorporated into and tested in this system. Using the clustering analysis we have obtained from 2 to 12 clusters per word. The decision rule is a generalized K-nearest neighbor (KNN) rule. Recognition accuracies comparable to those of speaker-dependent isolated word recognition systems have been obtained.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.