Abstract
In typical x-vector-based speaker recognition systems, standard linear discriminant analysis (LDA) is used to transform the x-vector space with the aim of maximising the between-speaker discriminant information while minimising the within-speaker variability. For LDA, it is customary to use all the available speakers in the speaker recognition development dataset. In this study, the authors investigate if it would be more beneficial to estimate the between-speaker discriminant information and the within-speaker variability using the most confusing samples and the most distant samples (from the target speaker mean), respectively, in the LDA-based channel compensation. The between-speaker variance is estimated using a pairwise approach where the most confusing non-target speaker samples are found based on the Euclidean distance between the speaker mean and adjacent speaker's samples. The within-speaker variance is estimated using the mean of each speaker and the furthermost samples in the speaker sessions. Experimental results demonstrate the proposed LDA approach for an x-vector-based speaker recognition system achieves over 17% relative improvement on equal error rate over standard LDA-based x-vector speaker recognition systems on the NIST2010 corext-corext condition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.