Abstract

There are many features proposed in the literature for voice activity detection (VAD). Shen et al. [20] first used a spectral entropy-based feature to detect regions of speech spurts under noisy conditions. However, VAD employing this feature was unreliable when the noise level greatly exceeds the speech level. To improve the performance of spectral entropy based VAD under low signal-to-noise ratios (SNRs), spectrum of a signal over a frame is divided into sub bands and spectral entropy is computed over these bands. Later, these spectral entropies are weighted and summed to obtain the entropy. Based on the amount of noise in each band, weights were found empirically. This approach was named as banded spectral entropy (BSE) [21]. In [24], deviation threshold computed from approximate ramp line and the sorted spectral coefficients of the band are adopted to decide useful/useless bands. In this paper, we propose a novel Teager Energy Band Spectral Entropy (TE_BSE) feature for VAD. Here, we carryout enhancement of spectral peaks employing Teager energy of each frequency transformed speech frame. This is followed with dividing of spectrum into sub bands and entropy computation over each band. The summing of entropy from each useful band is done to get TE _ BSE feature. We identify useful/useless bands following [24]. Later, we present the performance of our proposed VAD in terms of probability of detection $(\pmb{P}_{\pmb{D}})$ , probability of false alarm $(\pmb{P}_{\pmb{FA}})$ and probability of error under different noises and SNRs. Finally, from the VAD results on real-world sample, proposed VAD outperforms statistical based VAD by Sohn et. al. [8] with improved $\pmb{P}_{\pmb{D}}$ not at the cost of increase in $\pmb{P}_{\pmb{FA}}$ .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call