A comparison of three feature vector clustering procedures in a speech recognition paradigm

L Niles,N Dixon,H Silverman

doi:10.1109/icassp.1983.1172085

Abstract

One possible approach to achieving talker independence in discrete utterance recognition (DUR) is to classify speech feature vectors by using a talker-independent clustering procedure. There are many possible choices of clustering algorithms. This work studied the characteristics of three clustering procedures, Agglomerative, Basic Isodata, and a 'Biased Mean' modification of Basic Isodata, as applied to speech feature vectors. The feature extractor consisted of a six channel filterbank similar to those used in DUR systems. The speech data was derived from 19 (total) repetitions of a ten word vocabulary, spoken by 16 different talkers. Various distance functions and feature vector representations were employed. Agglomerative clustering did not produce clusters which corresponded to any apparent classification of speech events. The Biased Mean Isodata procedure did not converge, and therefore was not useful. The Basic Isodata algorithm produced clusters which were to varying degrees identifiable with classes of speech sounds. Simple classifiers for three such classes, based on these clusters, would classify feature vectors with 5-10% error rates. Best results were obtained by using feature vectors which consisted of the log filter channel energies. These test results are good enough to encourage further development of cluster-based feature vector classifiers.

Full Text