Abstract

Voice activity detection (VAD) is still a difficult task in noisy environments since the statistical distributions of speech and non-speech features are heavily overlapped in noisy environments. Considering that speech is a special type of acoustic signal that only occupies a small fraction of the whole acoustic space, we have proposed a new speech processing method for VAD by giving constraints on the processing space as a reproducing kernel Hilbert space (RKHS) [1]. In the RKHS, the estimation of the speech was regarded as a functional approximation problem. Via a regularization in the RKHS framework, a target function is learned to approximate the speech signal while the noise component is supposed to be smoothed out. In this framework, we could incorporate the nonlinear mapping functions in the approximation implicitly via a kernel function. The approximation function could capture the nonlinear and high-order statistical structure of the speech. Our VAD algorithm is designed on the basis of the power energy in this regularized RKHS. We have tested its performance on CENSREC-1-C data corpus for VAD task [1]. In this paper, we quantified its performance on the discriminability for speech and non-speech, and further compared its performance with several classical VAD algorithms. Experimental results showed that the proposed processing for speech enhanced the discriminability between the distributions of speech and non-speech, and got better performance on the VAD task than the classical VAD algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call