The results of an experiment on talker-dependent, connected-speech recognition of 10 Estonian words are reported. The words differ in so-called distinctive quantity, the major acoustic correlate of which is duration. These were consonant-vowel-vowel words, each word modeled with four variable-duration states. The words were spoken, and recognized, in sentence pairs of the form 'Did you say (word 1, word 2, word 3)? No, I said (word 4, word 5, word 6)'. The test sentences were spoken either at the same rate as the training sentences or at a much faster rate. In a first set of experiments, the likelihood of the spectral match was the only type of factor in the dynamic-programming best-path score. The best word recognition results obtained were 8% and 64% on the slow and fast test sentences respectively. In a second set of experiments, on probabilities or likelihoods of state durations, the best word recognition results were 86% and 68% respectively. It is concluded that speech rate can be a major problem for connected recognition of these words, and that in these experiments the problem has not been completely overcome, even using the likelihoods of the state duration ratios.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
Read full abstract