Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Tomi Kinnunen,Juhani Saastamoinen,Ville Hautamäki,Mikko Vinni,Pasi Fränti

doi:10.1016/j.patrec.2008.11.007

Tomi Kinnunen, Juhani Saastamoinen + Show 3 more

PDF Available

https://doi.org/10.1016/j.patrec.2008.11.007

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Gaussian mixture model with universal background model (GMM–UBM) is a standard reference classifier in speaker verification. We have recently proposed a simplified model using vector quantization (VQ–UBM). In this study, we extensively compare these two classifiers on NIST 2005, 2006 and 2008 SRE corpora, while having a standard discriminative classifier (GLDS–SVM) as a point of reference. We focus on parameter setting for N-top scoring, model order, and performance for different amounts of training data. The most interesting result, against a general belief, is that GMM–UBM yields better results for short segments whereas VQ–UBM is good for long utterances. The results also suggest that maximum likelihood training of the UBM is sub-optimal, and hence, alternative ways to train the UBM should be considered.

Full Text