Abstract

In this paper, a robust multiple observation likelihood ratio test (MOLRT) based voice activity detection (VAD) method is proposed. At the beginning of this paper, we introduce the Wiener filter to the observed signal in time domain which can help mitigating some noise. The reason why we use the Wiener filter for VAD is that the performance of the VAD is always better in high signal to noise ratio (SNR) range than in low SNR range. Then, some ideas are proposed to improve the performance of the MOLRT based VAD method. As we all know, a conventional MOLRT based method including three modules named likelihood ratio (LR) estimation, threshold setting and hangover technique. To improve the estimation accuracy of LR, we adopt the unbiased minimum mean-square error (MMSE) algorithm for noise power spectrum estimation in every frame, which is very effective for LR estimation in MOLRT-based VAD method. That is because the LR is a function of a prior and a posterior SNR and unbiased MMSE algorithm is very useful for noise estimation. In addition, to make our VAD method more robust, a dynamic threshold setting technique is proposed in our method, which is related to the minimum noise power spectrum. That is because minimum noise power spectrum can help us updating the value of threshold to a suitable level according to the denoised signal. Last but most important, a novel hangover algorithm is introduced in this paper comparing to the conventional HMM based hangover algorithm. In the novel hangover algorithm, the current frame is determined by the statistical result of the following speech/non-speech detections based on the likelihood ratio test. And the evaluation results reveal that proposed method significantly outperform the baseline result of LRT as regards VAD accuracy in both noise variations and low SNR conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call