Abstract

This paper describes an algorithm to decompose speech into tonal, transient, and residual components. The algorithm uses an MDCT-based hidden Markov chain model to isolate the tonal component and a wavelet-based hidden Markov tree model to isolate the transient component. We suggest that the auditory system, like the visual system, is probably sensitive to abrupt stimulus changes and that the transient component in speech may be particularly critical to speech perception. To test this suggestion, the transient component isolated by our algorithm was selectively amplified and recombined with the original speech to generate enhanced speech, with energy adjusted to be equal to the energy of the original speech. The intelligibility of the original and enhanced speech was evaluated in eleven human subjects by the modified rhyme protocol. The word recognition rates show that the enhanced speech can provide substantial improvement in speech intelligibility at low SNR levels (8% at -15 dB, 14% at -20 dB, and 18% at -25 dB)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call