Abstract
Voice activity detection (VAD) involves discriminating speech segments from background noise and is a critical step in numerous speech-related applications. However, distinguishing speech from noise based on the properties of noise is fallible, because it is difficult to predict and characterise the noise occurring in real life. In this study, the authors instead focus on the intrinsic characteristics of speech. The harmonic peaks of vowel sounds have higher energies than the other spectral components of speech and are the speech features most likely to survive in most cases of severe noise. Therefore, the energy differences between harmonic peaks and other spectral features show promise for enabling robust VAD. To exploit this feature, the harmonic peaks must be accurately located. For this purpose, this study proposes an efficient harmonic peak location detection (HPD) method. Based on extensive experiments conducted in the presence of various noise types and signal-to-noise ratios, we found that VAD with the proposed HPD approach outperforms existing VAD methods and does so with reasonable computational cost and higher robustness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.