A robust voice activity detection algorithm based on harmonic to noise ratio (HNR) is proposed. Harmonic to noise ratio is high in voice segments, because most of the voice energy distributes on the harmonic structure. However, it is unreliable or complicated to estimate the harmonic frequencies of noisy speech, and the HNR in full frequency band is not robust for environments with non-stationary band-limited noise. In this paper, several harmonic templates with fundamental frequency changing in log-scale step are used to match the wide-band voice harmonic structure, and the fundamental frequencies are not need to be estimated. To avoid the non-stationary band-limited noise, the contaminated frequencies are neglected automatically by frequency bin selection, which discards the harmonic and the noisy bins with the highest and lowest energy to keep the main clear harmonic structure. The final voice activity detection is based on the HNR of continuous frames, and it shows robust performance on several databases.
Read full abstract