Abstract

This paper proposes a simple model for speaker recognition based on i‐vector pairs, and analyzes its similarity and differences with respect to the state‐of‐the‐art Probabilistic Linear Discriminant Analysis (PLDA) and Pairwise Support Vector Machine (PSVM) models. Similar to the discriminative PSVM approach, we propose a generative model of i‐vector pairs, rather than an usual i‐vector based model. The model is based on two Gaussian distributions, one for the “same speakers” and the other for the “different speakers” i‐vector pairs, and on the assumption that the i‐vector pairs are independent. This independence assumption allows the distributions of the two classes to be independently estimated. The “Two‐Gaussian” approach can be extended to the Heavy‐Tailed distributions, still allowing a fast closed form solution to be obtained for testing i‐vector pairs. We show that this model is closely related to PLDA and to PSVM models, and that tested on the female part of the tel‐ tel NIST SRE 2010 extended evaluation set, it is able to achieve comparable accuracy with respect to the other models, trained with different objective functions and training procedures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.