Abstract

This paper deals with the problem of Voice Activity Detection (VAD) used in speaker diarization and detection systems to detect silent areas in speech stream. In fact, an incorrect VAD can cause serious performances degradation of these systems. To overcome this problem, we propose in this work, a new VAD architecture which combines two VAD. The first one is employed in the temporal domain and the second one is performed in the cepstral domain. Classical VAD architectures used either temporal VAD or cepstral one to eliminate the silent segments in audio documents. In this work, we propose to use the both VAD in order to reduce silent detection errors. This new VAD architecture is evaluated on speaker diarization and detection systems using the BIC distance (Bayesian Information Criterion), two different classifiers, Gaussian Mixture Model (GMM) classifier for the segmentation task and Hierarchical Ascending Clustering (HAC) or Support Vector Machine (SVM) classifier for the clustering task. A comparative investigation was done between our new proposed VAD architecture and the classical ones applied in the preprocessing step of the speaker diarization/detection systems. For the evaluation task, the different VAD architectures were tested on telephonic conversations extracted from the NIST-2005 corpus. The results of experiments have shown that the new VAD architecture has considerably enhanced the performances of speaker diarization/detection systems, comparing to the classical VAD architectures. The results obtained by the new VAD architecture are illustrated by a Diarization Error Rate (DER) of only 2.78% and a Speaker Detection Rate (SDR) of 97.67%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.