Abstract

In many speech signal processing applications, voice activity detection (VAD) plays an essential role for separating an audio stream into time intervals that contain speech activity and time intervals where speech is absent. Many features that reflect the presence of speech were introduced in literature. However, to our knowledge, no extensive comparison has been provided yet. In this article, we therefore present a structured overview of several established VAD features that target at different properties of speech. We categorize the features with respect to properties that are exploited, such as power, harmonicity, or modulation, and evaluate the performance of some dedicated features. The importance of temporal context is discussed in relation to latency restrictions imposed by different applications. Our analyses allow for selecting promising VAD features and finding a reasonable trade-off between performance and complexity.

Highlights

  • Today, speech-controlled applications and devices that support human speech communication become more and more popular

  • Under the objective to categorize features by speech properties that are employed, we have given an overview of established approaches

  • Our analyses showed that the performances of features vary, even when the same speech property was considered

Read more

Summary

Introduction

Speech-controlled applications and devices that support human speech communication become more and more popular. With the use of mobile devices, availability is no longer limited to a certain place; instead, it is possible to communicate in almost any situation. Efficient and convenient human-computer interfaces based on speech recognition allow us to control devices using spoken commands and to dictate text. Even hearing-impaired persons benefit from advanced speech signal processing: modern hearing aid devices amplify the desired speech signal and suppress interfering noise components. There are various different use cases for speech signal processing, the algorithms involved face a common challenge: based on a signal that is corrupted with noise, the presence of speech has to be detected before the signal is further processed

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.