Abstract

In the field of speaker verification, probabilistic linear discriminant analysis (PLDA) is the dominant method for back-end scoring. To estimate the PLDA model, the between-class covariance and within-class precision matrices must be estimated from samples. However, the empirical covariance/precision estimated from samples has estimation errors due to the limited number of samples available. In this paper, we propose a method to improve the conventional PLDA by estimating the PLDA model using the regularized within-class precision matrix. We use graphical least absolute shrinking and selection operator (GLASSO) for the regularization. The GLASSO regularization decreases the estimation errors in the empirical precision matrix by making the precision matrix sparse, which corresponds to the reflection of the conditional independence structure. The experimental results on text-dependent speaker verification reveal that the proposed method reduce the relative equal error rate by up to 23% compared with the conventional PLDA.

Highlights

  • Automatic speaker verification (ASV) is a technique to verify a user’s identity by comparing an utterance of a user with the reference utterance of a known target speaker

  • The dashed red line depicts the equal error rates (EERs) of the original probabilistic linear discriminant analysis (PLDA), and the solid blue line depicts the EERs of the graphical least absolute shrinking and selection operator (GLASSO)-PLDA

  • We improved the conventional PLDA by proposing the GLASSO-PLDA, in which the GLASSO-regularized within-class precision matrix was used to estimate the PLDA model

Read more

Summary

Introduction

Automatic speaker verification (ASV) is a technique to verify a user’s identity by comparing an utterance of a user (test utterance) with the reference utterance of a known target speaker (enrollment utterance). It is relatively easy to control the phrase variability in TD-SV because of the limitation for the phrase Due to these advantages, TD-SV has been widely used in many real applications that require both the higher performance and short utterance, such as voice assistant [3,4]. The score is computed in a more discriminative subspace to compensate for the within-class variability of the embedding [14]. The empirical covariance/precision matrix has estimation errors because of the limited number of available samples (corresponding to the embeddings in our case). We propose a method to improve the performance of conventional PLDA by regularizing the within-class precision matrix used to estimate the PLDA model.

I-Vector
Deep Speaker Embeddings
Gaussian Markov Random Field
GLASSO
GLASSO Applied PLDA
Prerequisite
Database
Experimental Setup
Results
Evaluation in Text-Independent Speaker Verification
Comparison with Matrix Banding
Conclusions
August
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.