Abstract

One of the major reasons for the performance degradation of a speaker verification (SV) system in real-world conditions is its inability to spot speech regions due to the presence of noise. This work focuses on the role of voice activity detection (VAD) methods in alleviating such shortcomings. The experiments are conducted on the core-core task of the speakers in the wild (SITW) challenge. Two VAD approaches are explored in this work. One of them is the recently proposed self-adaptive VAD and the other is based on vowel-like region (VLR) detection. For evaluating the effectiveness of these approaches, the SV systems are developed using the i-vector framework in the front-end and probabilistic linear discriminant analysis (PLDA) in the back-end. The self-adaptive VAD based system shows better performance compared to the VLR based system in high SNR condition. Under degraded conditions, the VLR based method is relatively more robust compared to self-adaptive VAD. Exploiting these complementary features, significant improvements in the SV performances are noted with the fusion of scores of the two systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call