PLDA in the i-supervector space for text-independent speaker verification

Ye Jiang,Longbiao Wang,Kong Lee

doi:10.1186/preaccept-1667880097114310

Abstract

In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) variability. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on an i-vector, dimension reduction is performed twice: first in the i-vector extraction process and second in the PLDA model. Keeping the full dimensionality of the i-vector in the i-supervector space for PLDA modeling and scoring would avoid unnecessary loss of information. We refer to the uncompressed i-vector as the i-supervector. The drawback in using the i-supervector with PLDA is the inversion of large matrices in the estimation of the full posterior distribution, which we show can be solved rather efficiently by portioning large matrices into smaller blocks. We also introduce the Gaussianized rank-norm, as an alternative to whitening, for feature normalization prior to PLDA modeling. We found that the i-supervector performs better during normalization. A better performance is obtained by combining the i-supervector and i-vector at the score level. Furthermore, we also analyze the computational complexity of the i-supervector system, compared with that of the i-vector, at four different stages of loading matrix estimation, posterior extraction, PLDA modeling, and PLDA scoring.

Highlights

Recent research in text-independent speaker verification has been focusing on the problem of compensating the mismatch between training and test speech segments
6 Conclusions We have introduced the use of the uncompressed form of i-vector for probabilistic linear discriminant analysis (PLDA)-based speaker verification
We introduced the use of Gaussianized rank-norm for feature normalization prior to PLDA modeling

Summary

Introduction

Recent research in text-independent speaker verification has been focusing on the problem of compensating the mismatch between training and test speech segments. The advantage of i-vector is that it represents a speech segment as a fixed-length vector instead of a variable-length sequence of acoustic features This greatly simplifies the modeling and scoring processes in speaker verification. We can assume that the i-vector is generated from a Gaussian density [13] instead of the mixture of Gaussian densities as usual in the case of acoustic features [7] In this regard, linear discriminant analysis (LDA) [13,26,27], nuisance attribute projection (NAP) [8,13,28], within-class covariance normalization (WCCN) [13,29,30], probabilistic LDA (PLDA) [10,31], and the heavy-tailed PLDA [32] have shown to be effective for such fixed-length data.

I-vector paradigm

Probabilistic LDA

M-step: model estimation

Model comparison

I-supervector pre-conditioning

Experiment

Channel factors in i-supervector space

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PLDA in the i-supervector space for text-independent speaker verification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Jan 1, 2014
License type: cc-by

Similar Papers

PLDA in the i-supervector space for text-independent speaker verification
Ye Jiang ... Longbiao Wang
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2014
Ye Jiang, et. al.Ye Jiang ... Longbiao Wang
15 Jul 2014
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2014

PLDA modeling in i-vector and supervector space for speaker verification
Ye Jiang ... Zhenmin Tang
-
Ye Jiang, et. al.Ye Jiang ... Zhenmin Tang
09 Sep 2012
09 Sep 2012

Fast Scoring of Full Posterior PLDA Models
Sandro Cumani
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 23
Sandro CumaniSandro Cumani
01 Nov 2015
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 23

Large-scale speaker search using PLDA on mismatched conditions
Jeff Ma ... Man-Hung Siu
-
Jeff Ma, et. al.Jeff Ma ... Man-Hung Siu
01 Apr 2015
01 Apr 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PLDA in the i-supervector space for text-independent speaker verification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing