Abstract

This paper proposes a voice activity detection (VAD) algorithm based on a novel long-term metric. By assuming that the most significant difference between noisy speech and non-speech is the harmonicity of the noisy speech spectrum caused by human nature, the long-term auto-correlation statistics (LTACS) measure is designed to be shown as a powerful metric used in VAD. The LTACS measure is calculated among several successive frames around the concerned frame and it represents the significance of harmonics of the signal spectrum over a long term rather than a short term. A novel LTACS-based VAD algorithm is derived by jointly making use of the minimum operator to reduce non-speech variability and of then calculating variance to detect speech. Simulative comparisons with four standardized VAD algorithms (ETSI adaptive multi-rate option 1 and 2, ETSI advanced front-end and G.729 Annex B) as well as three former proposed VAD algorithms show that the proposed LTACS-based VAD algorithm achieves the best performance under all SNR conditions, especially in strong noisy environments (e.g., SNR is -5dB or -10dB).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call