Maximum-Likelihood Linear Transformation for Unsupervised Domain Adaptation in Speaker Verification

Abhinav Misra,John H L Hansen

doi:10.1109/taslp.2018.2831460

Abhinav Misra, John H L Hansen

https://doi.org/10.1109/taslp.2018.2831460

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recent advances in front-end factor analysis through development of i-Vectors have led to significant gains in speaker recognition technology. However, the problem of mismatch between the domains of system development and evaluation data remains a challenging one. This domain mismatch occurs primarily because of the variability in the sources of development and evaluation data. In this study, we propose a novel method of unsupervised probabilistic feature transformation (UPFT) to reduce this domain mismatch by transforming an out-of-domain development data toward in-domain development data. We formulate the alignment of two different domains as a probability density estimation problem. We first train a Gaussian mixture model (GMM) using the out-of-domain i-Vectors. Next, we employ an expectation–maximization (EM) algorithm to fit the means of the GMM to the in-domain i-Vectors by maximizing the overall likelihood. At the optimum, the two domains become closer to each other in the i-Vector space. While reaching the optimum through multiple iterations of the EM, we reparameterize the centroid locations using the following set of transformation parameters: rotation, translation, and scaling. These transformation parameters, which are obtained during the optimization process, are later used to transform the out-of-domain i-Vectors toward in-domain i-Vectors. We observe that such a transformation leads to an improvement in performance of the out-of-domain speaker recognition system. Our proposed method has an added advantage of being completely unsupervised, and thus does not rely on any tuning parameters. We conduct experiments on both 2013 domain adaptation challenge corpus as well as National Institute of Standards and Technology Speaker Recognition Evaluation (SRE)—2016 corpus. On both corpora, we obtain significant improvements using the proposed UPFT solution. Specifically for the SRE-2016 corpus, using a cosine distance scoring based system, we are able to recover almost 90% of the performance gap between an in-domain and out-of-domain system.

Full Text