Abstract

We present a comparison of speaker verification systems based on unsupervised and supervised mixtures of probabilistic linear discriminant analysis (PLDA) models. This paper explores current applicability of unsupervised mixtures of PLDA models with Gaussian priors in a total variability space for speaker verification. Moreover, we analyze the experimental conditions under which this application is advantageous, taking into account the existing limitations of training database sizes, provided by the National Institute of Standards and Technology (NIST). We also present a full derivation of the Maximum Likelihood learning procedure for PLDA mixture. Experimental results for a cross-channel NIST Speaker Recognition Evaluation (SRE) 2010 verification task show that unsupervised PLDA mixture is more effective than other state-of-the-art methods. We show that for this task a combination of a homogeneous i-vector extractor and a mixture of two Gaussian PLDA models is more effective than a cross-channel i-vector extractor with a single Gaussian PLDA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call