Abstract

Voice Activity Detection (VAD) is one of the key techniques for many speech applications. Existing VAD algorithms have shown unsatisfied performance under nonstationary noise and low Signal-to-Noise-Ratio (SNR) situations. Motivated by the fact that people is able to distinguish the speech and non-speech even in low SNR situations, this paper studies the VAD technique from the pattern recognition point of view, where the VAD essentially is formulated as a binary classification problem. Specifically, the VAD is implemented by classifying the speech signal into speech and non-speech segments. The radial basis function (RBF) based support vector machine (SVM) is employed with supervised manner, which is perfectly suitable for binary classification tasks with some training samples. Aiming at achieving improved accuracy and robustness of the VAD technique to noise, the feature selection has been conducted by introducing the class separation measure (CSM) criterion to evaluate the capability of the feature vectors extracted for classifying speech and non-speech segments. Most famous speech features have been taken into account, including Mel-frequency cepstral coefficients (MFCC), the principal component analysis of the MFCC (PCA-MFCC), linear predictive coding (LPC) and linear predictive cepstral coding (LPCC). Intensive experimental results show that the MFCC features capture the most relevant information of speech and keep good separability of classification in different noisy conditions, so do the PCA-MFCC features. Moreover, the PCA-MFCC features are more robust to the noise with less computational cost. As a result, a VAD method by using the PCA-MFCC and the RBF-SVM as the classifier has been developed, which is termed as PCA-SVM-VAD for short. The experimental results with the NOIZEUS database show that the proposed PCA-SVM-VAD method has clear improvements over other VAD methods and performs much more robust in car noisy environment at various SNRs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call