Abstract

Speech that is corrupted by nonstationary interference, but contains segments that are still for applications such as speaker identification or speech recognition, is referred to as usable speech. A common example of nonstationary interference occurs when there is more than one person talking at the same time, which is known as co-channel speech. In general the above speech processing applications do not work in co-channel environments; however, they can work on the extracted segments. Unfortunately, currently available speech measures only detect about 75% of the total available speech. The first reason for this high error stems from the fact that no single feature can accurately identify all the speech characteristics. This situation can be resolved by using a Gaussian mixture model (GMM) based classifier to combine several speech features. A second source of error stems from the fact that the current speech measures treat each frame of co-channel data independently of the decisions made on adjacent frames. The situation can be resolved when a hidden Markov model (HMM) is used to incorporate any context dependent information in adjacent frames. Using this approach we were able to obtain 84% reduction of speech with a 16% false alarm rate.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.