Abstract

In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) variability. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on an i-vector, dimension reduction is performed twice: first in the i-vector extraction process and second in the PLDA model. Keeping the full dimensionality of the i-vector in the i-supervector space for PLDA modeling and scoring would avoid unnecessary loss of information. We refer to the uncompressed i-vector as the i-supervector. The drawback in using the i-supervector with PLDA is the inversion of large matrices in the estimation of the full posterior distribution, which we show can be solved rather efficiently by portioning large matrices into smaller blocks. We also introduce the Gaussianized rank-norm, as an alternative to whitening, for feature normalization prior to PLDA modeling. We found that the i-supervector performs better during normalization. A better performance is obtained by combining the i-supervector and i-vector at the score level. Furthermore, we also analyze the computational complexity of the i-supervector system, compared with that of the i-vector, at four different stages of loading matrix estimation, posterior extraction, PLDA modeling, and PLDA scoring.

Highlights

  • Recent research in text-independent speaker verification has been focusing on the problem of compensating the mismatch between training and test speech segments

  • 6 Conclusions We have introduced the use of the uncompressed form of i-vector for probabilistic linear discriminant analysis (PLDA)-based speaker verification

  • We introduced the use of Gaussianized rank-norm for feature normalization prior to PLDA modeling

Read more

Summary

Introduction

Recent research in text-independent speaker verification has been focusing on the problem of compensating the mismatch between training and test speech segments. The advantage of i-vector is that it represents a speech segment as a fixed-length vector instead of a variable-length sequence of acoustic features This greatly simplifies the modeling and scoring processes in speaker verification. We can assume that the i-vector is generated from a Gaussian density [13] instead of the mixture of Gaussian densities as usual in the case of acoustic features [7] In this regard, linear discriminant analysis (LDA) [13,26,27], nuisance attribute projection (NAP) [8,13,28], within-class covariance normalization (WCCN) [13,29,30], probabilistic LDA (PLDA) [10,31], and the heavy-tailed PLDA [32] have shown to be effective for such fixed-length data.

I-vector paradigm
Probabilistic LDA
M-step: model estimation
Model comparison
I-supervector pre-conditioning
Experiment
Channel factors in i-supervector space
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call