A training procedure for isolated word recognition systems

S Furui

doi:10.1109/tassp.1980.1163393

Abstract

A procedure has been devised to reduce the amount of training required for a phoneme-based speaker-dependent word recognition system and still maintain performance. Each new speaker is required to provide utterances of only a fraction of the entire vocabulary as a training set. A set of transformation rules is used to estimate phoneme templates for the entire vocabulary from phoneme templates included in the training. The transformation rules are obtained in a pretraining procedure in which a group of speakers provides utterances of the entire vocabulary and multiple regression analysis (MRA) is used to obtain linear estimates of the entire phoneme template set in terms of the set designated as training templates. This group of speakers is generally distinct from the group of training speakers. Thus, since the transformation rules are established independent of the training speakers, the entire procedure can be considered a hybrid speaker-dependent/ speaker-independent system. Results of recognition experiments using spoken digits uttered by 30 male and female speakers and 67 airport names uttered by 30 male speakers have ascertained the effectiveness of this training procedure. A mean recognition accuracy of 98.2 percent was obtained for the latter utterance set after a 12-word training procedure.

Full Text