Abstract

In the midst of background noise environments, the performance of speaker recognition (SR) systems is considerably degraded. To estimate the model mismatch between training and evaluation data, we also propose an intra Kullback-Leibler distance (intra-KLD) measure. Based on the intra-KLD, the performance of SR systems using speech enhancement (SE) and multi-condition (MC) training can be predicted with reduced computational complexity. Since SE cannot fully remove real-world noise without modifying the clean speech signal, the SR model trained only with a clean speech signal cannot fully represent the evaluation data that include various noisy signals preprocessed by SE. To compensate for this problem, we apply SE as a preprocessing block not only for the evaluation stage, but for the training stage. Moreover, we propose to combine SE and MC training (SE-MC) where various sets of features are extracted in an SE domain and a model for each speaker is trained based on the mixture of SE-domain features. Under various background noise environments, SE, MC, and SE-MC produced SR error rates of 43.51%, 25.00%, and 20.29%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.