Abstract

A simple but efficient voice activity detector based on the Hilbert transform and a dynamic threshold is presented to be used on the pre-processing of audio signals. The algorithm to define the dynamic threshold is a modification of a convex combination found in literature. This scheme allows the detection of prosodic and silence segments on a speech in presence of non-ideal conditions like a spectral overlapped noise. The present work shows preliminary results over a database built with some political speech. The tests were performed adding artificial noise to natural noises over the audio signals, and some algorithms are compared. Results will be extrapolated to the field of adaptive filtering on monophonic signals and the analysis of speech pathologies on futures works.

Highlights

  • Many authors label the section of the speech as voiced, where the vocal chords vibrate and produce sound, unvoiced, where the vocal chords are not vibrating, and silenced [1] [2]

  • Test For the test, it was made a database with different political speeches published on the internet. These speeches were recorded in noisy environments that can disturb the voice activity detection and the noise has spectral overlapping with the real signal of the speech

  • Voice activity detection take an important place in issues such as emotion detection in patients with diseases or emotional disorders, in remote monitoring of these patients, in pathologies of the vocal tract, and others

Read more

Summary

Introduction

Many authors label the section of the speech as voiced, where the vocal chords vibrate and produce sound, unvoiced, where the vocal chords are not vibrating, and silenced [1] [2]. The zero-crossing rate and the coefficients of linear prediction, can be combined in such a way that the distance between them would indicate if the analysed segment is speech or silent pauses [1] or used with a threshold, fixed or dynamic, to detect the speech [5]. This work uses signal own features like the zero-crossing rate and the signal energy in a particular window, in order to determinate a dynamic threshold.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.