DNN i-vector based Fishervoice and PLDA SVM scoring for NIST SRE 2016

Jinghua Zhong,Helen Meng

doi:10.1109/iscslp.2018.8706606

Abstract

Our ongoing work that applies Fishervoice to map joint factor analysis (JFA)-mean supervectors 1 into a compressed discriminant subspace has shown that performing cosine distance scoring on the Fishervoice projected vectors outperforms classical JFA. In this paper, we refine Fishervoice for low-dimensional i-vectors by only using the nonparametric between-class scatter matrix to substitute the parametric one in linear discriminative analysis (LDA). The task of 2016 speaker recognition evaluation (SRE16) only has unlabeled in-domain training data and labeled out-of-domain training data for model training. Support vector machine (SVM) scoring can capture the discriminative information embedded in the unlabeled in-domain training data. We perform probabilistic linear discriminant analysis (PLDA) before SVM scoring for inter-session compensation with speaker label information from out-of-domain training data. This approach constitutes CUHK’s submission for SRE16. In this paper, we present a detailed analysis of the approaches and the performance gains with refined Fishervoice and PLDA SVM scoring.1The JFA-mean supervector of an utterance is a GMM supervector obtained from the JFA model.

Full Text