Maximum a posteriori linear regression for speaker recognition

Xiang Zhang,Haipeng Wang,Jianping Zhang,Xiang Xiao,Yonghong Yan

doi:10.1109/icassp.2010.5495579

Abstract

Recently, using maximum likelihood linear regression (MLLR) transforms as the features for SVM based speaker recognition has been proposed. This can achieve performance comparable to that obtained with state-of-the-art approaches. In this paper, we focus on calculating the transforms based on a GMM universal background model (UBM). Rather than estimating the transforms using maximum likelihood criterion, we describe a new feature extraction technique for speaker recognition based on maximum a posteriori linear regression (MAPLR). This work is enriched by a proposed multi-class technique, which clusters the Gaussian mixtures into regression classes and estimates a different transform for each class. All the transforms of all the classes for a given utterance are concatenated into a supervector for SVM classification. Experiments on a NIST 2008 SRE corpus show that the speaker recognition system using MAPLR outperforms MLLR, and the multi-class approach can also bring significant gains for MAPLR system.

Full Text