Abstract

Current speech recognition techniques can usually determine accurately the vowels within a particular word since vowels are of relatively long duration and change character slowly. Within the class of consonants, the stop consonants are often hard to differentiate due to their short duration and transient nature. In isolated word recognition, a time normalization step is required before comparing the unknown word with reference word templates. Recursive estimation techniques should make transient sound recognition more accurate and eliminate the necessity of time normalization. The approach presented here utilized the recursive exact least square ladder form estimation algorithm to determine an autoregressive model (hence a spectral representation) of the speech. This recursive algorithm updates its representation at every speech sample using exponentially weighted past data. Thus it is possible to track the spectral changes in the speech without much time smearing. By using an appropriate multidimensional speech parameterization, the space spanned by various speech sounds should allow subdivisions to be associated with each of the phonemes. Currently, cluster regions for vowels in the F1-F2 space are known. Using a more extensive parameterization suited for transition type sounds, should extend the accuracy of consonant recognition and remove the requirement of normalization. The recursive ladder technique will be explained and its estimation behavior illustrated. The voiced stops /b/, /d/ and /g/ followed by different vowels were examined to determine the usefulness of recursive spectral estimation in differentiating these very similar transient sounds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call