Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances

Chang Zeng,Xiaoxiao Miao,Xin Wang,Erica Cooper,Junichi Yamagishi

doi:10.1016/j.csl.2024.101619

Abstract

Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring. However, the sequential optimization of the front-end and back-end models may lead to a local minimum, which theoretically prevents the whole system from achieving the best optimization. Although some methods have been proposed for jointly optimizing the two models, such as the generalized end-to-end (GE2E) model and NPLDA E2E model, most of these methods have not fully investigated how to model the intra-relationship between multiple enrollment utterances. In this paper, we propose a new E2E joint method for speaker verification especially designed for the practical scenario of multiple enrollment utterances. To leverage the intra-relationship among multiple enrollment utterances, our model comes equipped with frame-level and utterance-level attention mechanisms. Additionally, focal loss is utilized to balance the importance of positive and negative samples within a mini-batch and focus on the difficult samples during the training process. We also utilize several data augmentation techniques, including conventional noise augmentation using MUSAN and RIRs datasets and a unique speaker embedding-level mixup strategy for better optimization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Jan 18, 2024
License type: cc-by-nc-nd

Similar Papers

Privacy-preserving PLDA speaker verification using outsourced secure computation
Amos Treiber ... Christoph Busch
Speech Communication | VOL. 114
Amos Treiber, et. al.Amos Treiber ... Christoph Busch
01 Oct 2019
Speech Communication | VOL. 114

ChildAugment: Data augmentation methods for zero-resource children's speaker verification.
Vishwanath Pratap Singh ... Tomi Kinnunen
The Journal of the Acoustical Society of America | VOL. 155
Vishwanath Pratap Singh, et. al.Vishwanath Pratap Singh ... Tomi Kinnunen
01 Mar 2024
The Journal of the Acoustical Society of America | VOL. 155

ASV-SUBTOOLS: Open Source Toolkit for Automatic Speaker Verification
Fuchuan Tong ... Lin Li
-
Fuchuan Tong, et. al.Fuchuan Tong ... Lin Li
06 Jun 2021
06 Jun 2021

Unifying Cosine and PLDA Back-ends for Speaker Verification
Zhiyuan Peng ... Guanglu Wan
-
Zhiyuan Peng, et. al.Zhiyuan Peng ... Guanglu Wan
18 Sep 2022
18 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language