Hybrid voice activity detection system based on LSTM and auditory speech features

Yunus Korkmaz,Aytuğ Boyacı

doi:10.1016/j.bspc.2022.104408

Abstract

Voice Activity Detection (VAD), sometimes called as Speech Activity Detection, is the process of extracting speech regions in audio recordings including many type of sounds. Because undesired data causes both computational complexity and time wasting, most of speech based applications consider only speech part (region of interest) and ignore the rest. This is the main reason that makes usage of the VAD stands a preliminary stage in applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, a successful semi-supervised VAD system, which we named as “hybridVAD”, was proposed especially for the environment with high signal-to-noise ratio (SNR) with the manner of two-stage. At first, VAD decision was obtained from a relatively simple Long-Short Term Memory (LSTM) network trained by auditory speech features like energy, zero crossing rate (ZCR) and 13rd order-Mel Frequency Cepstral Coefficients (MFCC). After we applied a reasonable thresholding strategy to the same features to have second VAD decision, we combined both decisions with logical operators. The result was surprisingly showed that final VAD decision have low FEC and OVER errors, which are specifically critical for any speaker diarization system, mostly in the environments with high SNR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid voice activity detection system based on LSTM and auditory speech features

Abstract

Talk to us

Similar Papers

More From: Biomedical Signal Processing and Control

Lead the way for us

Journal: Biomedical Signal Processing and Control	Publication Date: Nov 17, 2022
Citations: 10

Similar Papers

Unsupervised and supervised VAD systems using combination of time and frequency domain features
Yunus Korkmaz ... Aytuğ Boyacı
Biomedical Signal Processing and Control | VOL. 61
Yunus Korkmaz, et. al.Yunus Korkmaz ... Aytuğ Boyacı
15 Jun 2020
Biomedical Signal Processing and Control | VOL. 61

Decision Robustness of Voice Activity Segmentation in Unconstrained Mobile Speaker Recognition Environments
Andreas Nautsch ... Christoph Busch
-
Andreas Nautsch, et. al.Andreas Nautsch ... Christoph Busch
01 Sep 2016
01 Sep 2016

Detection, diarization, and transcription of far-field lecture speech
Jing Huang ... Karthik Visweswariah
-
Jing Huang, et. al.Jing Huang ... Karthik Visweswariah
27 Aug 2007
27 Aug 2007

A new algorithm for voice activity detection
Jianqiang Wei ... Hui Zeng
-
Jianqiang Wei, et. al. Jianqiang Wei ... Hui Zeng
25 May 2003
25 May 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid voice activity detection system based on LSTM and auditory speech features

Abstract

Talk to us

Similar Papers

More From: Biomedical Signal Processing and Control