Abstract

In i-vector based speaker recognition systems, back-end classifiers are trained to factor out nuisance information and retain only the speaker identity. As a result, variabilities arising due to gender, language and accent (among many others) are suppressed. Inter-task fusion, in which such metadata information obtained from automatic systems is used, has been shown to improve speaker recognition performance. In this paper, we explore a Bayesian approach towards inter-task fusion. Speaker similarity score for a test recording is obtained by marginalizing the posterior probability of a speaker. Gender and language probabilities for the test audio are combined with speaker posteriors to obtain a final speaker score. The proposed approach is demonstrated for speaker verification and speaker identification tasks on the NIST SRE 2008 dataset. Relative improvements of up to 10% and 8% are obtained when fusing gender and language information, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.