Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method

Seiichi Nakagawa,Hirobumi Nakanishi

doi:10.1080/03772063.1988.11436710

Abstract

In this paper, a stochastic dynamic time warping method for speaker-independent recognition is proposed and some considerations are described on speaker-independent consonant recognition and word recognition on a large vocabulary size. In this method, conditional probabilities were used instead of local distances in a standard dynamic time warping method, and transition probabilities instead of path costs. This is related to both the standard DTW method and the hidden Markov model. In word recognition, the whole word templates are constructed by the concatenation of syllable templates, which are taken from spoken words. And, we got the reference patterns from 216 words uttered by 30 male speakers and recognized the other 200 words uttered by the other 10 speakers. The standard dynamic time warping method for speaker-independent recognition on 200 words gave the average word recognition rate of 89.3%. The stochastic dynamic time warping method we proposed here improved the recognition rate to 92.9%.

Full Text