Abstract

Minimum duration constraints and energy thresholds for phonemes were used to increase the recognition accuracy of an 86000-word speaker-trained isolated word recognizer. Minimum duration constraints force the phoneme models to map to acoustic segments longer than the duration minima for the phonemes. Such constraints result in significant lowering of likelihoods of many incorrect word choices, improving the accuracy of acoustic recognition and recognition with the language model. The phoneme models were also improved by correcting the segmentation of the phonemes in the training set. During training, the boundaries between phonemes are not marked accurately. Energy is used to correct these boundaries. Application of an energy threshold improves the segment boundaries between stops and sonorants (vowels, liquids, and glides), between fricatives and sonorants, between affricates and sonorants. and between breath noise and sonorants. On two speakers, the overall reduction in errors using minimum durations and energy thresholds is from 27.3% to 23.1% for acoustic recognition and from 14.3% to 8.8% with the language model. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call