Usable speech detection using a context dependent Gaussian mixture model classifier

R.E Yantorno,B.Y Smolenski,J.K Shah,A.N Iyer

doi:10.1109/iscas.2004.1329884

Abstract

Speech that is corrupted by nonstationary interference, but contains segments that are still for applications such as speaker identification or speech recognition, is referred to as usable speech. A common example of nonstationary interference occurs when there is more than one person talking at the same time, which is known as co-channel speech. In general the above speech processing applications do not work in co-channel environments; however, they can work on the extracted segments. Unfortunately, currently available speech measures only detect about 75% of the total available speech. The first reason for this high error stems from the fact that no single feature can accurately identify all the speech characteristics. This situation can be resolved by using a Gaussian mixture model (GMM) based classifier to combine several speech features. A second source of error stems from the fact that the current speech measures treat each frame of co-channel data independently of the decisions made on adjacent frames. The situation can be resolved when a hidden Markov model (HMM) is used to incorporate any context dependent information in adjacent frames. Using this approach we were able to obtain 84% reduction of speech with a 16% false alarm rate.

Full Text