Abstract

We propose a robust voice activity detection (VAD) based on density ratio estimation. In highly noisy environments, the likelihood ratio test (LRT) is effective. Conventional LRT estimates both speech and noise models, calculates the likelihood of each model, and uses ratios of such likelihood to detect speech. However, in LRT, the likelihood ratio of speech and noise models is required, whereas likelihood of individual models is not necessarily required. The framework of the density ratio estimation models likelihood ratio functions by a kernel and directly generates a likelihood ratio. Applying density ratio estimation to VAD requires that feature selection and noise adaptation must be considered. This is because the density ratio estimation constrains the shape of the likelihood ratio functions and speech is dynamic. This paper addresses these problems. To improve accuracy, the proposed method is combined with conventional LRT. Experimental results using CENSREC-1-C show that the proposed method is more effective than conventional methods, especially in non-stationary noisy environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call