Speaker recognition based on multi-subsystems likelihood scores fusion

Heng-Jie Li

doi:10.3724/sp.j.1087.2008.00116

Abstract

To describe an approach of speaker clustering based on multi-subsystems likelihood scores fusion to the text-independent speaker recognition system with short speech data in various telephone microphone channels. The registered speakers were aggregated into clusters with 2 types of speaker model similarity measures, namely, Kullback-Leibler Divergence (KLD) and Generalized Likelihood Ratio (GLR). A single-layer perception network was built for each cluster, fusing the likelihood scores of 3 sub-systems with the speaker features of MFCC, LPCC and SSFE, respectively. Concerning the robustness of SSFE system and recognition accuracy of the other 2 systems, the 3 sub-systems complement each other with the fusion network in each cluster. Experimental results on NIST SRE 05's database show a relative equal error rate reduction of 10.3% and 8.7%, with KLD and GLR, respectively, with respect to an all-speaker-shared fusion network.

Full Text