This paper describes a real-time speech-recognition system employing adaptive threshold logic elements called “Adalines.” Time-normalized digital patterns representing the time-frequency spectrum are obtained from amplitude-normalized outputs of 8 bandpass filters. Adaline networks that perform the speech-pattern classification are simulated in an IBM-1620 computer. In a recent experiment, the Adalines were trained on 8 samples each of a group of 16 phonetically balanced words. After correct classification of these samples, different samples, spoken by the same speaker, were correctly identified 112 times without error. When tested on new voices of the same sex, the machine achieved an average recognition rate of 90%; however, substantial improvement is realized by including the new speaker(s) in the training group. The use of adaptive networks as pattern classifiers has achieved enormous system flexibility, since complete redesign of the classification system can be accomplished by a training process. The system has successfully carried out many speech-recognition tasks, among them recognition of the 10 digits spoken in 4 different languages and identification of different speakers saying the same spoken words.
Read full abstract