Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition

V Gupta,M Lennig,P Mermelstein,P Kenny,P.F Seitz,D O'Shaughnessy

doi:10.1016/0885-2308(92)90028-3

Abstract

Many acoustic misrecognitions in our 86 000-word speaker-trained isolated-word recognizer are due to phonemic hidden Markov models (phoneme models) mapping to short segments of speech. When we force these models to map to longer segments corresponding to the observed minimum durations for the phonemes, then the likelihood of the incorrect phoneme sequences drops dramatically. This drop in the likelihood of the incorrect words results in significant reduction in the acoustic recognition 1 1 We use the term acoustic recognition error rate to mean the recognition error rate when every word in the vocabulary is considered a priori equally likely. error rate. Even in cases where acoustic recognition performance is unchanged, the likelihood of the correct word choice improves relative to the incorrect word choices, resulting in significant reduction in recognition error rate with the language model. On nine speakers, the error rate for acoustic recognition reduces from 18·6 to 17·3%, while the error rate with the language model reduces from 9·2 to 7·2%. We have also improved the phoneme models by correcting the segmentation of the phonemes in the training set. During training, the boundaries between phonemes are not marked accurately. We use energy to correct these boundaries. Application of an energy threshold improves the segment boundaries between stops and sonorants (vowels, liquids and glides), between fricatives and sonorants, between affricates and sonorants and between breath noise and sonorants. Training the phoneme models with these segmented phonemes results in models which increase recognition accuracy significantly. On two speakers, the error rate for acoustic recognition reduces from 26·5 to 23·1%, while the error rate with the language model reduces from 11·3 to 8·8%. This reduction in error rate is in addition to the error rate reductions obtained by imposing minimum duration constraints. The overall reduction in errors for these two speakers using minimum durations and energy thresholds is from 27·3 to 23·1% for acoustic recognition, and from 14·3 to 8·8% with the language model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Oct 1, 1992
Citations: 1

Similar Papers

Using phoneme duration and energy contour information to improve large vocabulary isolated-word recognition
V.N Gupta ... M Lennig
-
V.N Gupta, et. al.V.N Gupta ... M Lennig
01 Jan 1991
01 Jan 1991

Improved antisaccade performance with risperidone in schizophrenia
...
Journal of Neurology, Neurosurgery & Psychiatry | VOL. 72
, et. al. ...
01 Apr 2002
Journal of Neurology, Neurosurgery & Psychiatry | VOL. 72

Effects of two commercial electronic prescribing systems on prescribing error rates in hospital in-patients: a before and after study.
Johanna I Westbrook ... William B Runciman
PLoS Medicine | VOL. 9
Johanna I Westbrook, et. al.Johanna I Westbrook ... William B Runciman
31 Jan 2012
PLoS Medicine | VOL. 9

Evaluating the impact of information technology on medication errors: a simulation.
James G Anderson ... Thaddeus J Hunt
Journal of the American Medical Informatics Association : JAMIA | VOL. 10
James G Anderson, et. al.James G Anderson ... Thaddeus J Hunt
28 Jan 2003
Journal of the American Medical Informatics Association : JAMIA | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language