Abstract

In the midst of background noise environments, the performance of speaker recognition (SR) systems is considerably degraded. To estimate the model mismatch between training and evaluation data, we also propose an intra Kullback-Leibler distance (intra-KLD) measure. Based on the intra-KLD, the performance of SR systems using speech enhancement (SE) and multi-condition (MC) training can be predicted with reduced computational complexity. Since SE cannot fully remove real-world noise without modifying the clean speech signal, the SR model trained only with a clean speech signal cannot fully represent the evaluation data that include various noisy signals preprocessed by SE. To compensate for this problem, we apply SE as a preprocessing block not only for the evaluation stage, but for the training stage. Moreover, we propose to combine SE and MC training (SE-MC) where various sets of features are extracted in an SE domain and a model for each speaker is trained based on the mixture of SE-domain features. Under various background noise environments, SE, MC, and SE-MC produced SR error rates of 43.51%, 25.00%, and 20.29%, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call