Abstract

In this paper, we propose a novel model for incorporating the voicing information in a speech recognition system. The voicing information employed is estimated by a novel method that can provide this information for each filter-bank channel, without requiring any information about the fundamental frequency. A Viterbi-style training procedure is employed to estimate the voicing-probability of each mixture at each HMM state. Experiments are performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved at low SNRs when the voicing information is incorporated within the standard model and two models that had already compensated for the effect of the noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call