Abstract

The duration of speech segments can significantly impact the performance of text-independent speaker verification systems. In real world applications which require high accuracy on short utterances, the performance of i-vector speaker verification framework degrades significantly considering that i-vectors extracted from short utterances are less reliable (i.e., uncertainty is higher) than those extracted from long utterances. Therefore, to handle duration variability properly, a more realistic approach seems to be required. This study is an extension to our recently proposed nearest neighbor probabilistic linear discriminant analysis (NN-PLDA) which estimates the parameters of PLDA in i-vector speaker verification framework using a nonparametric form rather than maximum likelihood estimation (MLE) obtained by an EM algorithm, and has been shown to provide superior performance. In NN-PLDA, the between-speaker covariance matrix that represents global information about the speaker variability is replaced with a local estimation computed on a nearest neighbor basis for each target speaker. Compared to their parametric counterparts, the nonparametric between- and within-speaker scatter matrices can better exploit the discriminant information in training data and are more adapted to sample distributions. In this paper, we provide further analysis on the proposed nonparametrically trained PLDA as well as introduce a duration variability modeling technique in the estimation of the within-speaker scatter matrix as to compensate for the effect of limited speech data. We evaluate our approach using core–10sec and 10sec–10sec telephone trial conditions of NIST 2010 SRE as well as on the truncated test utterances in extended core condition with duration less than 10 s. We also present the results obtained by the successful incorporation of NN-PLDA on the recent NIST 2016 speaker recognition evaluation. In all experiments, considerable performance improvement is obtained with the proposed technique compared to a generatively trained PLDA model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.