A simplified, robust training procedure for speaker trained, isolated word recognition systems

L R Rabiner,J G Wilpon

doi:10.1121/1.385120

Abstract

One of the most important operations in isolated word speech recognition systems is the method used to obtain the word reference templates. For speaker trained systems, the techniques that have been used include casual training, averaging, and use of statistical pattern recognition clustering methods. In a recent study, Rabiner and Wilpon showed that the statistical techniques, when combined with the technique of averaging the autocorrelation coefficients of all tokens within the cluster, provided a reliable, robust set of reference templates. The only drawback to this method was the extensive, burdensome training required for the statistical analysis. Since the statistical training method could not be used in most practical situations, techniques were investigated for obtaining a simplified, robust training procedure which would incorporate many of the ideas of the statistical approach. Such a training method is described in this paper. The advantages of this new training procedure, over (previously used) casual training include: ease of use, robustness of the templates, minimized computation in the recognition algorithm, normalized durations of all references, and recognition accuracy comparable to previous systems with double the number of templates.

Full Text