Decision Robustness of Voice Activity Segmentation in Unconstrained Mobile Speaker Recognition Environments

Andreas Nautsch,Reiner Bamberger,Christoph Busch

doi:10.1109/biosig.2016.7736916

Abstract

Voice activity detection (VAD) is an essential segmentation process in speaker recognition systems, seperating speech and non-speech segments of voice samples. In speaker recognition, references are modelled purely by concerning speech segments. Different VAD segmentations lead to variations in biometric models, and consequently in system performance. Thus, VAD decisions need to be robust among different conditions. In this paper, the decision robustness of different VAD algorithms is examined on mobile data by simulating different environmental noise conditions for which we propose a Hamming distance based analysis. By examining speech and speaker recognition based VADs, we further propose to extend a well- performing VAD algorithm, which is based on likelihood ratio comparison of speech to non-speech models, by including most dominant frequency component (MDFC) features for selection of model training segments. Thereby, more robust VAD decisions are conducted by 7%, while sustaining an average EER SNR-sensitivity of 0.76% per dB SNR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Decision Robustness of Voice Activity Segmentation in Unconstrained Mobile Speaker Recognition Environments

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Spectral matching based voice activity detector for improved speaker recognition
K T Sreekumar ... C Santhosh Kumar
-
K T Sreekumar, et. al.K T Sreekumar ... C Santhosh Kumar
01 Jan 2014
01 Jan 2014

Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions
M.S Rudramurthy ... V Kamakshi Prasad
Journal of Intelligent Systems | VOL. 23
M.S Rudramurthy, et. al.M.S Rudramurthy ... V Kamakshi Prasad
02 Apr 2014
Journal of Intelligent Systems | VOL. 23

Speaker identification using convolutional-long short-term memory neural networks
Serkan Tokgoz ... Issa M Panahi
The Journal of the Acoustical Society of America | VOL. 146
Serkan Tokgoz, et. al.Serkan Tokgoz ... Issa M Panahi
01 Oct 2019
The Journal of the Acoustical Society of America | VOL. 146

Speaker Recognition with VAD
Jian Ling ... Jianwei Zhu
-
Jian Ling, et. al.Jian Ling ... Jianwei Zhu
01 Jun 2009
01 Jun 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Decision Robustness of Voice Activity Segmentation in Unconstrained Mobile Speaker Recognition Environments

Abstract

Talk to us

Similar Papers